Data driven organizations require automated management and governance across all of their data without being restricted to a single data format, data platform or even a single vendor’s offerings. ODPi Egeria (https://odpi.github.io/egeria/) is a new open source project that provides the open APIs, event formats, types and integration logic to make this possible.
The first release of Egeria focuses on creating a single virtual view of metadata. It can federate queries across different metadata repositories and has the ability to synchronize metadata between different repositories. The synchronization protocol controls what is shared, with which repositories and ensures that updates to metadata can be made with integrity.
Before delving into the details of Egeria, it is worth level-setting what we mean by metadata.
Scope of Metadata
Metadata describes data in all its forms. This includes where the data is located, how it is stored, how frequently it is changing, what it represents, how it is organized, who owns it, how accurate it is, who is using it and how it should be governed. It is the knowledge base that sits underneath all data activities.
A key deliverable of Egeria is its comprehensive definition of open metadata types (https://odpi.github.io/egeria/open-metadata-publication/website/open-metadata-types/) that provides the structures used to exchange metadata.
The Need for Integration
Metadata management is the process of maintaining metadata. Often data tools and platforms come with their own metadata repository and management services. These metadata repositories are for the exclusive use of their accompanying tools and reflects the data known about by their users. The effect is that there are typically many metadata repositories operating in an organization, each with a different view of the data that the organization has access to.
The role of Egeria is to allow data tools and platforms to continue to work with their own metadata repositories, whilst ensuring the metadata used across the organization is consistent.
Open Metadata Repository Cohort
The metadata repositories are registered into what is called an open metadata repository cohort. An open metadata repository cohort has common event topic on an event bus such as Apache Kafka (http://kafka.apache.org/). Each repository that is to share metadata puts a registration event on this topic. This registration event provides information about the repository including its network address. The other repositories pick up the network address to configure their local federation service. This federation service is used to query metadata from each of the repositories to create an enterprise view. The event topic is also used to replicate selected metadata between the repositories.
Open Metadata Repository Services (OMRS)
The OMRS (https://odpi.github.io/egeria/open-metadata-implementation/repository-services/) is the component within ODPi Egeria that supports the federation and metadata exchange services. It is integrated with each repository (https://odpi.github.io/egeria/open-metadata-publication/website/open-metadata-integration-patterns/) and manages all of the open metadata protocols.
More to Come
And this is only the start of Egeria’s journey. Stay tuned as we add open governance capability for Chief Data Officers, Privacy Officers, Data Owners, Data Scientists and much more. Join us to get involved at the Egeria Github page.