The Egeria team is kicking off the new year with a new release! 1.3 is now available and ready to help you start addressing data duplication issues.
With this release, when the Egeria discovery server identifies new metadata assets or when they are added manually, the stewardship services will scan them for duplicates. If a duplicate metadata asset is identified, it can be added to the de-duplication zone, and an annotated relationship is created for the duplicates.
The de-duplication zone is a best practice that we often use. This zone holds all duplicates until they are resolved. For more details on zones, look at the zones on our website or watch our governance zone video on our YouTube channel.
Notifications of the metadata duplicates are generated by the stewardship server and published as events.
The Discovery and Stewardship Servies
The purpose of the discovery service is to “discover” the data landscape. The Stewardship services provide the ability to examine the metadata more closely and then resolve any issues.
Future plans for de-duplication
The 1.3 release is the first step in dealing with metadata asset duplication. Next year we will also look at enhancing de-duplication by enabling duplicate metadata assets to be merged to create a prime metadata asset.
Therefore, when two duplicates are identified, such as A1 and A2, they will be merged to form A’. If A1 and A2 have metadata that conflict, this will need to be carefully resolved through the stewardship service using “survivorship.” The metadata asset A1 and A2 will still exist in their respective metadata repositories; however, in Egeria A’ will be stored to represent A1 and A2, and their duplicate relationship.
If a third duplicate of the metadata asset is then identified, then A’ and A3 are merged to form A” and so on if more are found.
How this is implemented – The Stewardship Server and the Stewardship Action OMAS
As already mentioned, the Stewardship Server supports the scanning of assets and the notification when duplicate suspects are detected. It can also orchestrate the linking of duplicate assets.
This server is supported by:
The Stewardship Action OMAS supports the asset scan API, the recording of exceptions, and duplicate suspects and the linking of the duplicate assets.
More details and Tutorials
For more details and tutorials, please check out the information on our website! https://egeria.odpi.org
Or join the conversation on our Slack Channel ODPI #Egeria-Discussions