Becoming a Data Driven Organization
Many organizations today wish to become Data Driven. This means that data is easy to locate and use by decision makers and applications. This sounds like it should be simple, but it takes a good understanding of your data backed by an effective data management program to ensue accurate and reliable insights.
Throughout your organization’s IT systems, data is copied, enriched, manipulated and duplicated. This causes complexity and exponential growth of the data. The end result being:
- uncontrolled duplication of data
- confusion over data heritage and lineage
- misunderstanding and confusion when different teams derive conflicting results
- inability to deal with legislation, such as GDPR
- inconsistent access, security issues and governance over multiple data sets
Without enterprise metadata management, data management becomes a challenge and the business will struggle to become data driven.
The issue with metadata is that it is ‘siloed’ in each application or data store. To overcome this the metadata needs to be stitched together. Some metadata management tools provide this capability in part. Unfortunately, no one tool covers the entire enterprise’s data landscape.
ODPi Egeria is an Open Source metadata management capability that provides a new Open Standard for metadata exchange and consumption. The Egeria platform uses a peer to peer protocol, that connects disparate metadata repositories enabling interoperability and governance across the entire metadata landscape!
Here are my top Seven Savvy Skills that we are continuing to deliver via the ODPi Egeria project, that enable an organization to become data driven:
1. Ultimate Source and Destination for data assets, or Full Lineage
This is the ability to stitch together the possible data journeys as it flows from inception to all end points. This requires the capture of each step of the data’s journey and identifying where the data has been manipulated.
With Egeria’s ability to integrate the different tools involved in the data journey, it is possible to follow what happened to the data. Making it a simple task to discover the possible sources and destinations of a data value or perform impact analysis when changes occur in the IT landscape. Imagine how easily you could identify all locations that need to be addressed when dealing with GDPR, if you only had an enterprise wide map of all your data assets.
There are many Data Lineage patterns we consider in Egeria, the popular ones are – Design, Operational, Vertical, Horizontal, Historical and Glossary Lineage. I will be covering these patterns in a future blog.
2. Provenance of Metadata
Just as it is key in the art world to understand the provenance of a painting to determine if it is fake or worth a fortune we also need to understand the origin of metadata as it is gathered from different tools. Understanding the provenance of your metadata verifies that is has been captured and managed by an authoritative source.
3. Time Travel Through the Data Landscape
Imagine being able to go back to a point in time and verify how the data was processed Or be able to identify, if and when something changed and altered the asset. Whether this involves looking at the metadata as it was last week or a year ago, the historical capture of lineage by Egeria makes it possible to follow the data journey and review all touch points.
Egeria typically uses a dedicated graph repository to hold the lineage history. This is a fully searchable repository and provides a change audit log for all changes in data flows.
4. Data Awareness and Notifications
With a real-time distributed metadata management system you gain a new awareness of all data assets in the organisation. Each time a change is made to a data asset, such as a new column is added to a table, or a new transform is applied, these events are logged and distributed in real time so subscribers (typically other tools) are immediately alerted. No matter which tools are used, data consumers will be able to follow the evolution of the data landscape.
5. Governance, Access and Security
Data Stores and Applications each have their own governance, access and security, which is fine when they operate in isolation but when they are part of a wider data ecosystem then consistency is required.
There are many patterns for creating governance, access and security in Egeria. These are based on the type of data, its location, origin or the purpose to which it is being used. The policies are driven by Egeria’s metadata so they are always in sync with the data that exists in the data landscape. I will cover this in a future blog about “The Data Lake and Asset Access Maturity Model”.
Audit logging of access requests is also key. It is all about knowing who accessed a data asset and the breadth of data access requested by an individual. These are two interesting insights to have when investigating, data leaks or fraud.
6. Understanding for All, through Business Glossaries
There as many naming conventions for data assets, which often look nonsensical. Using a Business Glossary enables a ‘Business Definition’ to be created such as “customer name” which could be linked to multiple data items to show they contain the customer name. Having a Glossary for metadata makes understanding for both the technical and business perspective simple.
Once a Glossary is in place then classifications can be created, such as “PII” or “Company Sensitive” and associated with glossary terms to show data of that type has the attached classification. This is another powerful capability if you are looking to add additional security based on Egeria’s metadata.
A single piece of data can be used in many contexts by different groups and individuals. Consumers and creators of data know and understand the data they work with in a unique way. With the Egeria enabled metadata interchange, the organization’s use of data is collected and any feedback from the data consumers is passed to the data owners to collectively improve the quality and understanding for that data!
Egeria provides the mechanism to govern and understand an organization’s complete data landscape by linking the metadata distributed across many tools. For me, it is the metadata equivalent of stitching together all the isolated islands, continents, oceans and seas to make an interactive map of the world. With the map in place, you have complete understanding of the landscape and can build out new capabilities with confidence.
Next week I will be blogging about “User Profiles and Personas – It’s a granularity thing”.