How should an Egeria OMAS find entities and relationships?
An Open Metadata Access Service (OMAS) is a specialized set of APIs and events intended to make using open metadata easier for a specific community of developers. New OMASs can be contributed directly to the Egeria project, or developed/distributed independently. This blog post should be of interest to anyone writing an OMAS.
An OMAS often needs to create or retrieve an entity relationship.
The following patterns are common:
- The OMAS creates an entity then continues to work with it. The
addEntity()method of the Metadata Collection interface returns the EntityDetail object. The OMAS can keep that object around and operate on the entity. If the OMAS retains knowledge of the entity GUID it can later use the
getEntity()method to retrieve the same entity again. The same pattern is possible with
getRelationship(). If the GUID is available, getting the instance is straightforward.
- The OMAS needs to retrieve an entity or relationship that was created earlier. In this case, the OMAS does not know the entity or relationship GUID. In this case the OMAS can use one of the ‘find’ methods to search for the entity or relationship instance. The OMAS may expect to get back exactly one instance, or it may expect a set of instances. In either case, if a set of instances found, the OMAS may filter it to identify the particular entity or relationship it needs.
To keep things brief, the remainder of this article focuses on entities. Working with relationships is similar.
Finding things in a metadata repository
If an OMAS needs to find an entity or relationship, and the instance GUID is not known, the OMAS can use one of the find methods to search for it in the metadata repositories. The find methods are on the Metadata Collection interface supported by OMRS repositories.
To be precise, we are referring to the
OMRSMetadataCollection class. This class is extended by the
EnterpriseOMRSMetadataCollection class that the OMAS has access to via its Enterprise Connector.
The Metadata Collection interface provides the
findEntitiesByPropertyValue() methods. The first method accepts a ‘match properties’ object which can be used to specify a ‘match’ value for each property. The second method accepts a search string which is compared to all string properties. There are similar methods for finding relationships.
Exact match or regular expression?
A string used as a match property or search string can be used as an exact match or as a regular expression. The author of an OMAS needs to consider what the end-user is expecting when the OMAS performs a search. The author can then decide whether a string should be matched exactly or treated as a regular expression (‘regex’).
The Metadata Collection interface treats all search strings as regular expressions.
If the author is expecting a string to be regex-matched, they should compose the string as a regex expression and call the Metadata Collection interface.
If an OMAS author wants an exact match there is a set of helper methods in the
OMRSRepositoryHelper. These methods support escaping of relatively simple search strings in a manner that is supported by most OMRS repository connectors, including those for repositories that do not support full regular expression syntax. An OMAS author should always use the repository helper methods when they can. For more complex searches, beyond the level supported by the helper methods, an OMAS author should implement their own regular expression, but it is important to be aware that not all repositories will support all regular expressions. The regular expressions provided by the helper are a minimal set that most repositories are able to support. More complex expressions can be used with repositories that have full regex processing, such as the in-memory repository or graph repository.
For example, if an OMAS author wants an exact match of a string, they should call
getExactMatchRegex() which will ‘escape’ the whole string, regardless of content. This helper method will frame the whole string with
\E escape characters. It’s OK to call
getExactMatchRegex() even if the string value only contains alphanumeric characters and has no regex special characters. However, it should only be used for escaping a single, simple string – don’t use it for a string that already contains either of these escape sequences. Also, don’t use it to build up complex regular expressions.
Here’s an example. Metadata objects frequently have compound names composed of multiple fields with separators. For example, an OMAS may need to retrieve the entity with qualifiedName equal to
'[table “” not found /]. Some of these characters are special characters in a regex. If the OMAS needs an exact match, it can call
getExactMatchRegex() to escape the search string. Although the string contains regex special characters, the search will only return an entity with the exact value.
Exact match of a substring
OMRSRepositoryHelper also provides helper methods that will escape a string and build a regex around it so it will match values that contain, start or end with the original string value. These methods combine exact match processing with relatively simple regex substring expressions. If an OMAS needs a more complicated regex the author should code it directly instead of using the
Getting back more than you expected
Using an exact match doesn’t guarantee you will get only one entity, as there may be multiple entities that have a matching property value. Egeria is designed to be distributed and eventually consistent, so the repositories do not enforce uniqueness. Even if a property is ‘unique’ there may be more than one instance with that value within a cohort.
Filtering a search result
If an OMAS searches and gets back a set of entities, it may need to filter the set to identify an individual entity. The filtering might compare each search match property with the instance properties of each returned entity. However, if the
OMRSRepositoryHelper methods were used to escape any match properties prior to the search, the OMAS would need to ‘unescape’ those match properties. It could do this by calling the
getUnqualifiedLiteralString() method. Alternatively, the OMAS could construct a pair of match properties objects. One object is never escaped and the other is identical except it is escaped just prior to the search. The OMAS would then use the unescaped object for post-search filtering.