Part 1 – The journey to context-based Policy and Access Enforcement Services
An organization’s data landscape often contains information that is sensitive and may need to be masked or withheld from viewing. This is due to regulations or organizational policies that require data access to be carefully controlled. Typical regulations that affect data access are General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), Payment Card Industry Data Security Standard (PCI-DSS), and so on.
There are many ways to implement policy and access enforcement; however, in this blog, I want to illustrate four scenarios to show how policy and access enforcement works. I shall then elaborate on two key capabilities that promote a consistent and flexible Policy and Access Enforcement Services (PAES). The first capability is the introduction of enterprise metadata to provide clarity and agility when you have a changeable and complex data landscape. Secondly, the use of context to enable flexible access policies. These scenarios are:
- No security layer
- Introduction of a Policy and Access Enforcement Services (PAES) layer
- Use of metadata classification to drive policy and access enforcement
- Context-based access and policy enforcement driven by metadata
Please also refer to the webinar which covers and further illustrates the approaches advocated in this blog.
1. No Policy and Access Enforcement Services (PAES) layer
Individual data store or application policy enforcement
This is the most basic model for implementing policy and access control. Each application or data store administrator is responsible for setting the policies and access enforcement independently.
Setting up policy and access enforcement for a single data store or application is fairly straightforward. This works well when an organization has only a handful of applications and data stores.
The following diagram shows an example of this approach.
Pros– this is fine for small organizations where there is a good understanding of all the applications and data stores, as it may be feasible to keep the policies inline manually. With a low number of applications and data stores, there is typically a reasonable understanding of the content, data provenance, and data locations.
Cons– In most organizations, it is difficult to ensure all application and data store owners are in sync regarding the interpretation and implementation of company access policies. When there are many applications and data stores, it becomes very challenging to identify related data items between them, so data mismatches can easily occur. Setting up security for each individual application and data store is repetitive and does not promote consistency.
2. Introduction of a policy and access enforcement tool
Dealing with many applications and data stores
Organizations typically have hundreds of data stores and applications, each requiring access and enforcement policies. Maintaining a consistent PAES across the data landscape is essential to meet regulations and company policies. However, this is often a challenge as the data is not stored or named in the same way across different applications and data stores. This issue is compounded when the data landscape is complicated or not well understood due to:
- additional applications and data stores resulting from company mergers
- uncontrolled growth or sprawl of applications and data stores
- teams which purchase shadow IT
- an environment where users can duplicate data, or
- data is provided by a third party or licensed for limited use.
When the variety and number of technologies involved grow, technology-specific PAES become increasingly challenging, and consistency is almost impossible. A single PAES layer that spans an organization’s data stores is required in this situation. A typical PAES creates a number of user profiles that enable data access at a granular level.
Introducing a policy and access enforcement layer
One key motivation for introducing a policy and access enforcement layer is to facilitate access to data for data consumers that will enable them to complete their job. Yet this can be tricky for a couple of reasons:
- if that data has multiple types of sensitivities that need to be accounted for
- if there is no understanding of where the data is and what its provenance is
To facilitate many types of access, multiple access groups are created with different profiles for different data consumers.
By introducing a policy and access enforcement layer as identified in the blue box in the above diagram, it is now possible for a centralized team to tackle the problem of identifying the access policy for each data asset.
Pro’s– This approach enables a unified set of policies and access enforcement rules across the entire enterprise landscape.
Con’s– To unify the application and data landscape through a PAES layer can be complex. The security officers have to classify the data in multiple siloed repositories, often with little to no data insights, which can be a huge challenge. For example, when the data object person_id is classified as PII (Personally Identifiable Information) in the first data store, does this imply the same sensitivity profile as customer_id, staff_id or gold_cust_id in other data stores. Ensuring consistency is essential but challenging when there is no enterprise view of the metadata.
3. Use of metadata classification to drive policy and access enforcement
The introduction of an enterprise-wide metadata capability brings an understanding of the data in all applications and data stores. In this blog, we have used ODPi Egeria, which is an Open Metadata Standard and set of capabilities that enables the free flow of metadata between repositories from any vendor. Egeria additionally provides the ability to stitch together the metadata between all the metadata repositories, ultimately enabling lineage analysis, data provenance, and more.
By labeling and tagging metadata, Egeria can provide context and insights for the data being requested by data consumers. It is feasible for asset owners to identify what data is, what restrictions may apply to it and whether it is sensitive. A project team or manager could also tag the data to identify which projects it is associated with and can be consumed by. For more details on how Egeria handles metadata, please refer to the “Part 2 – New approaches to managing access to sensitive data” blog for a deep dive on metadata.
Pros– Egeria creates an enterprise-wide view or catalog of all the metadata assets. The definitions (held as tagged classifications or security tags) can be used by the policy authoring tools. This makes the process of identifying data items a simple task, which requires little or no “second-guessing” by the Policy and Access Enforcement officer. Having access to metadata ensures greater understanding and consistency that saves much time and effort by the security officer. It also reduces inconsistencies and enables context policies to be easily added.
By identifying which projects data can be consumed by, there is an additional identifier which can determine the purpose or context of that data.
Cons– A metadata layer has to be set up across the enterprise. This may, for many organizations, mean that cultural changes are required as the metadata becomes an enterprise-wide asset and not siloed hidden gems.
This doesn’t allow for query time contextual data access or complex fine-grain policies to be applied. This capability is added to the next scenario.
4. Context-based access and policy enforcement driven by metadata
As we have seen, policies are created when data access needs to be restricted. However, these policies can make it challenging for data consumers to get access to data they may require. Typically, when this happens, data consumers will need a higher level of security privileges to access that data. However, this higher level can be a concern as the data consumer may have access to a much wider set of data than is appropriate. In many cases, context-based access provides an appropriate means to determine policy and access enforcement when a data request is made.
How does context work?
With each data request, a query time context is now submitted, which dynamically determines the access policy specific to the data request.
The query time context can include details such as:
- the reason for accessing the data,
- the location of the access
- or even the application accessing the data which may have additional protections allowing for a more permissive access policy for that application.
In the future, this could also extend to knowledge of other data items the user has already retrieved to be factored into decisions around access to future data sets. Then the combination of the query time context and user attributes can be used by the policy and access enforcement to take into account the circumstance of that specific data request.
For example, a data consumer may be restricted from viewing the data elements – Employee Name, Salary, Employee bank details – together. However, when the data is not combined in a single view but viewed as individual items, each is relatively meaningless, as they are not personally identifiable.
It is important to identify what data is sensitive (or restricted) and why. For example, salary is sensitive and should be restricted when shared with UserID as this is personal information. However, salary on its own with no details that can link it to an individual is quite harmless; the context makes quite a difference. Having the ability to set context-based rules adds much flexibility that makes it feasible for data consumers to do their jobs safely.
- A consistent way to define and enforce complex data access policies.
- It only requires lightweight connectors to support new data storage and new data processing technologies.
- The architecture has been designed to scale to meet data science needs to process very large volumes of data in a single request using distributed technologies.
For more details refer to the Palisade project in GitHub
This example additionally uses Egeria metadata, discussed in the three scenarios above.
What is going on behind the scenes
In this final example, data requests are routed via a context-based PAES layer. This then makes use of the user-supplied context to set the data consumers’ access rights per data request. This enables each data request to be evaluated based on:
- the contextual information about the query
- the data consumers attributes
- the metadata labels on the data
This is a very practical approach to enabling policies to have enhanced flexibility based on the metadata labels. Here the data consumers access rights are determined by their profile, data requested, and any other context attributes that are considered.
An example of advanced metadata to drive context-based access and policy enforcement
A data scientist has two data investigations to carry out for the HR department. Each activity requires access to overlapping and sensitive data:
Activity 1 – Salary analysis to investigate pay equality for staff to identify pay biases.
In this scenario, the fields that identify an individual such as name, id, the address should be redacted. The data scientist would need to see pay, band, gender, job role, nationality, age, years of service, etc.
Activity 2 Identify staff who are eligible for a free health check, based on length of service (over five years).
The data scientist will need access to employee name, id, location, email address, and joining date.
Each activity requires conflicting data, and most of this data would normally be redacted as it is personally sensitive. It is not desirable to share the complete personnel data files with research staff as they would have access to their colleagues’ personal information. What is more appropriate is to share subsets of the data based on the context of the research according to the purpose.
With the addition of context, the data consumer can now run both sets of data analysis without any access issues. See the chart below, which illustrates the company definition of restricted data (default security), and then shows what access can be set by context or purpose.
Green cells represent that data access should be allowed.
You will notice, there is very little in common between the two queries, each requires the data consumers accesses to be different from the default access permissions.
With the use of Palisade, the context and user attributes determine the access to these fields. Additionally, the context is logged for audit purposes, providing an insight into who is accessing data, and under what context.
For more details on how Palisade identifies what data can be shared, please refer to the Palisade Context Overview Blog.
Pros – In this example, access to data is tailored to the request submitted by the data consumer based on context as well as user attributes. This means the data consumer has greater access to suitable data, does not require a high-level security clearance, and does not have to apply for changes to their access policy to be made. Further, the Data Governance and Security team can set more flexible data access policies from a basis of understanding. The metadata with security tags and other labels provides a great insight into the data.
Cons– This method is quite sophisticated and requires an extra step when the data consumer requests data as they now have to provide additional contextual information. For example, they may need to select a purpose from a pre-defined drop-down list as determined by the business. However, the Governance or Security Officer will need to update many policies to take advantage of the greater flexibility that thee contextual information provides.
Organizations’ desire to be data-driven means their data consumers and decision-makers require greater access to data. It can be challenging to design flexible access policies with the many regulations that are imposed on data that restrict access. Using a context-based approach is the perfect way to enable flexible access while maintaining a secure policy and access enforcement. This approach helps organizations avoid the cost multiplication associated with traditional application integration as they become data-driven across the enterprise.
To enable context-based access, it is essential to have a comprehensive enterprise-wide metadata governance capability in place. All the data in the enterprise needs to be suitably labeled so that context-based rules can be created. The ultimate aim to enable widespread data access within the bounds of regulations and controls.
Without automated context-based access, getting to the data can be tedious and difficult. If the data cannot be accessed, the data consumer may need to create a manual request to access data. Worse, they may find alternate ways to get at the data putting themselves and the organization at risk of data breaches!
Get in touch and watch the webinar
Click here to view the Webex that accompanies this blog.
To Join the Egeria Project or Slack Channel, check out the following links: