Part 2 – Focus on Palisade for Context-Based Policy and Access
Being able to do your job easily, efficiently, and hassle-free is important because when you can’t, work quickly becomes frustrating. Leaving most of us to wonder, “why do we bother”?
The reason can often be the consequences of regulations, company policies, or the disjointed, complex nature of an organization’s systems.
For data scientists, researchers and analysts, work can be frustrating as when they finally locate the data needed to do their jobs; they are told: “No – access is restricted – please refer to the security officer!”. The implementation and interpretation of regulatory and company policies often result in inflexible access.
We all accept that certain data needs to be protected as it is sensitive, confidential, or regulated. However, having a blanket policy, driven by rigid access rules, creates challenges.
Often access policies inhibit data consumers from doing their jobs. Typically to overcome these issues, the data consumer may have to track down a data owner to make a special access request. This seems a tad antiquated and can be a very slow process!
In this blog, we have used Palisade for Policy and Access Enforcement Services (PAES) and Egeria to maintain and exchange metadata labels that span all of an organization’s data repositories. We explore how a Context-Based PAES, can be driven by metadata labels to resolve data access issues. Please refer to Blog 2 – Defining Metadata – which details how a distributed metadata and governance tool can provide an enterprise view and understanding of all the data assets in an organization.
What is Context – for Data Access?
Context focuses on the rationale and circumstance of the data request. Such that when a request for data is made at query time, contextual information about that request is evaluated. The contextual information typically consists of user attributes, user contexts, such as location and the purpose of the data request.” with “Such that when a request for data is made, query time contextual information about that request is evaluated. The contextual information typically consists of user attributes, query time, contexts such as location, and the purpose of the data request.
Most access control applications have a rigid approach to data access taking the user’s role to determine what access control they are assigned. With context, the access is now set at query time, based on a number of factors that are tailored to that individual, situation, and combination of data assets.
A simple dataset example:
Let’s consider the context surrounding the data assets and combinations of those assets for an employee file. This file contains the assets:
Employee_Name, Employee_Mobile, Employee_Gross_Salary, Employee_Age.
When these items are shared, they provide sensitive and confidential information about the individual.
However, Employee_Name and Employee_Mobile are often shared in company internal phone listings, in a harmless manner.
Likewise, Employee_Gross_Salary and Employee_Age on their own are low risks as they do not identify an individual.
Including more context for each data request, provides a new level of flexibility. Using context now drives what data is redacted or withheld, enabling the data consumer to access a wider variety of data assets in situations that meet organizational requirements and regulations.
So how does this work?
An organization can configure a set of labels to represent contextual information of interest. The data consumer now additionally selects the relevant predefined context information for each data request.
When the PAES receives the data request, the access policy is applied, and where appropriate, the data is redacted or masked. In this example, we combined context with user credentials when evaluating each data request; based on this combination, the user’s access profile is dynamically tailored per data request.
For context to be useful, the business needs to identify what context is important for accessing data. It is also important to note that the context may not be entirely based on the data items alone; the identity of the data consumer, their location, or even the time/date of the request and more can be evaluated.
What goes on behind the scenes!
To accurately create a set of context policies, an organization should have an encompassing metadata platform that provides an enterprise view of all metadata. The asset owners are responsible and must classify assets in the metadata layer. They will need to identify the asset’s sensitivity. Determining the sensitivity so that organizational and legal access requirements are met. A glossary providing the semantic meaning of the assets in the metadata layer is of great help to assist the security officer in understanding the data assets. In the open metadata layer, the glossary and assets are linked; this is essential to gain an organizational understanding of the data assets.
Let’s walk through how this works…
- The Data Consumer (user) uses a client to read data by providing an identifier to the data, their user credentials, and any required contextual information.
- The user is authenticated with the user service, and all the user’s attributes are provided back to the Palisade Service.
- The Palisade Service now gets the Egeria Resource Service to provide full mapping of all required data resources (assets) along with how to find and connect to those resources. At this point, Egeria checks that the user is allowed access to the assets based on who they are and where they are.
- The Policy Service will firstly apply any coarse-grain rules using the user’s attributes, query time contextual information, and the resource metadata. Secondly, the Policy Service retrieves the fine grain (record level) rules which require access to the individual data records to be applied.
- The Palisade Service caches the contextual information, user attributes, and the mapping of resource details to each resources’ fine-grain rules. These details are all cached under a token, which can be used to retrieve that data later by the Data Service.
- The Palisade Service returns the token along with the mapping of resources to data service connection details the user is allowed to access.
- The request, outcome, and summary of the data provided are logged via the Audit Service.
- Per resource, a request is made to the Data Service passing through the token and resource details.
- The Data Service uses the token to request from the Palisade Service all the information required to enforce the fine-grain rules to the data.
- Then the Data Service connects to the data and applies the rules which transform and filter the data before it is streamed back to the client.
For more details refer to “Standard Flow for a read request through the Palisade system .”
The world is not black and white; it is full of many colors and shades. In the same way, security needs a level of flexibility to enable data consumers to access data safely. With a rigid set of access policies, a data consumer may only be able to see 50% of the data assets. When context is added, they may now be able to see 90% of the data, with the extra 40% being based on the context of their data request. Meaning that they can perform their jobs in a safe and productive manner.
Context-Based Policy and Access Enforcement, driven by metadata labels, resolves data access issues that data consumers face daily. It provides a way to:
- enable users to access safely the data they need
- increase worker productivity
- automate complex data access
- increase audibility
- reduce employee frustration!
Get in touch and watch the Webex
Click here to view the Webex that accompanies this blog.
To Join the Egeria Project or Slack Channel, check out the following links:
- Contribute to ODPi Egeria
- Contact the team via slack – join here & go to #egeria-discussions . We’d love to hear what you think!