Category

Blog

New Annual ODPi White Paper Focuses on Extending Artificial Intelligence Models to BI

By Blog, ODPi BI and AI

AI that can be utilized by non-specialists helping businesses drive results in a fraction of the time and cost

SAN FRANCISCO – November 7, 2019 – ODPi, a nonprofit organization accelerating the open ecosystem of big data solutions, today announced a new white paper that showcases the strategies of BI leaders addressing how Business Intelligence (BI) is being impacted by Artificial Intelligent (AI) technologies. The annual research takes the pulse of industry leaders and covers how the “BI Engame” is being addressed by corporate strategies that incorporate AI. The white paper was authored by the ODPi Technical Steering Committee, the ODPi BI & AI Project, ODPi members Index Analytics and SAS, plus current technical contributors from other BI vendors including Qlik, Microsoft, ThoughtSpot, MicroStrategy, and Tableau.

To view the complete white paper, click here: BI Endgame – When AI Meets BI

“BI needs to be more competitive to survive. By implementing AI that can be used easily by business users, BI can analyze data in ways that were previously only reserved for data scientists with advanced degrees. Without this, BI is at its endgame,” said Cupid Chan, CTO at Index Analytics, Board of Directors, ODPi Technical Steering Committee (TSC), Chairperson of ODPi BI & AI Project. “This white paper shows key insights into how leaders in the field are implementing AI technologies into BI and how they view the future.”

By making AI intuitive and easy to implement, BI can empower business users to analyze data in ways that were previously only reserved for data scientists with advanced degrees. AI can aid in data exploration, automatically find patterns in data, and predict future outcomes to help businesses drive results in a fraction of the time/cost.

ODPi’s newest white paper, entitled “BI Endgame – When AI Meets BI,” presents best practices for combining BI and AI. ODPi members Index Analytics and SAS plus technical contributors including Qlik, Microsoft, ThoughtSpot, MicroStrategy, and Tableau present a wide range of real world strategies on how they are implementing AI, the challenges they face and where they are looking to enhance their investments.

Their unique perspectives include:

  • Developing AI capabilities that can be easily adopted by analysts in a low-code/no-code environment
  • Reducing or eliminating the gaps, both technical and organizational, between AI and BI teams within companies 
  • Integration with machine learning algorithms and platforms, including ODBC, gRPC, and REST, and democratizing access to insights driven by AI/ML platforms
  • Managing incoming data – AI and BI are only as good as the data that goes in, must be supported by a well-governed data management program
  • Utilizing analytics engine to uncover insights hidden in data, AI-driven explanations reveal value of data points within data visualization
  • Building trust is the key to adoption of AI-infused analytics 

To view the complete white paper, click here: BI Endgame – When AI Meets BI

Hosted by The Linux Foundation, ODPi is an industry effort that aims to accelerate the adoption of big data technologies. Through a vendor-neutral, industry-wide approach to data governance and data science, ODPi members bring maturity and choice to an open ecosystem. 

About ODPi

ODPi is a nonprofit organization committed to simplification and standardization of the big data ecosystem. As a shared industry effort, ODPi members represent big data technology, solution provider and end user organizations focused on promoting and advancing the state of big data technologies for the enterprise. For more information about ODPi, please visit: http://odpi.org

###

ODPi Egeria: How to find entities and relationships

By Blog, ODPi Egeria, Tech Deep Dive

How should an Egeria OMAS find entities and relationships?

An Open Metadata Access Service (OMAS) is a specialized set of APIs and events intended to make using open metadata easier for a specific community of developers. New OMASs can be contributed directly to the Egeria project, or developed/distributed independently. This blog post should be of interest to anyone writing an OMAS.

An OMAS often needs to create or retrieve an entity relationship.

The following patterns are common:

  • The OMAS creates an entity then continues to work with it. The addEntity() method of the Metadata Collection interface returns the EntityDetail object. The OMAS can keep that object around and operate on the entity. If the OMAS retains knowledge of the entity GUID it can later use the getEntity() method to retrieve the same entity again. The same pattern is possible with addRelationship() and getRelationship(). If the GUID is available, getting the instance is straightforward.
  • The OMAS needs to retrieve an entity or relationship that was created earlier. In this case, the OMAS does not know the entity or relationship GUID. In this case the OMAS can use one of the ‘find’ methods to search for the entity or relationship instance. The OMAS may expect to get back exactly one instance, or it may expect a set of instances. In either case, if a set of instances found, the OMAS may filter it to identify the particular entity or relationship it needs.

To keep things brief, the remainder of this article focuses on entities. Working with relationships is similar.

Finding things in a metadata repository

If an OMAS needs to find an entity or relationship, and the instance GUID is not known, the OMAS can use one of the find methods to search for it in the metadata repositories. The find methods are on the Metadata Collection interface supported by OMRS repositories.

To be precise, we are referring to the OMRSMetadataCollection class. This class is extended by the EnterpriseOMRSMetadataCollection class that the OMAS has access to via its Enterprise Connector.

The Metadata Collection interface provides the findEntitiesByProperty() and findEntitiesByPropertyValue() methods. The first method accepts a ‘match properties’ object which can be used to specify a ‘match’ value for each property. The second method accepts a search string which is compared to all string properties. There are similar methods for finding relationships.

Exact match or regular expression?

A string used as a match property or search string can be used as an exact match or as a regular expression.  The author of an OMAS needs to consider what the end-user is expecting when the OMAS performs a search. The author can then decide whether a string should be matched exactly or treated as a regular expression (‘regex’).

The Metadata Collection interface treats all search strings as regular expressions.

If the author is expecting a string to be regex-matched, they should compose the string as a regex expression and call the Metadata Collection interface.

If an OMAS author wants an exact match there is a set of helper methods in the OMRSRepositoryHelper. These methods support escaping of relatively simple search strings in a manner that is supported by most OMRS repository connectors, including those for repositories that do not support full regular expression syntax. An OMAS author should always use the repository helper methods when they can. For more complex searches, beyond the level supported by the helper methods, an OMAS author should implement their own regular expression, but it is important to be aware that not all repositories will support all regular expressions. The regular expressions provided by the helper are a minimal set that most repositories are able to support. More complex expressions can be used with repositories that have full regex processing, such as the in-memory repository or graph repository.

For example, if an OMAS author wants an exact match of a string, they should call OMRSRepositoryHelpergetExactMatchRegex() which will ‘escape’ the whole string, regardless of content. This helper method will frame the whole string with \Q \E escape characters. It’s OK to call getExactMatchRegex() even if the string value only contains alphanumeric characters and has no regex special characters. However, it should only be used for escaping a single, simple string – don’t use it for a string that already contains either of these escape sequences. Also, don’t use it to build up complex regular expressions.

Here’s an example. Metadata objects frequently have compound names composed of multiple fields with separators. For example, an OMAS may need to retrieve the entity with qualifiedName equal to '[table “” not found /]
EMPLOYEE.[column]FNAME'
.  Some of these characters are special characters in a regex. If the OMAS needs an exact match, it can call OMRSRepositoryHelper getExactMatchRegex() to escape the search string. Although the string contains regex special characters, the search will only return an entity with the exact value.

Exact match of a substring

The OMRSRepositoryHelper also provides helper methods that will escape a string and build a regex around it so it will match values that contain, start or end with the original string value. These methods combine exact match processing with relatively simple regex substring expressions. If an OMAS needs a more complicated regex the author should code it directly instead of using the OMRSRepositoryHelper methods.

Getting back more than you expected

Using an exact match doesn’t guarantee you will get only one entity, as there may be multiple entities that have a matching property value. Egeria is designed to be distributed and eventually consistent, so the repositories do not enforce uniqueness. Even if a property is ‘unique’ there may be more than one instance with that value within a cohort.

Filtering a search result

If an OMAS searches and gets back a set of entities, it may need to filter the set to identify an individual entity. The filtering might compare each search match property with the instance properties of each returned entity. However, if the OMRSRepositoryHelper methods were used to escape any match properties prior to the search, the OMAS would need to ‘unescape’ those match properties. It could do this by calling the OMRSRepositoryHelper getUnqualifiedLiteralString() method. Alternatively, the OMAS could construct a pair of match properties objects. One object is never escaped and the other is identical except it is escaped just prior to the search. The OMAS would then use the unescaped object for post-search filtering.

Using Egeria to Integrate with Data Virtualization Tool

By ODPi Egeria

 

What we want to achieve

For a business user, what is relevant are the business concepts and connections between them. What we want to achieve is to have a consistent and meaningful representation of data resources, along with policies to handle them. We want the end user to have access only to the governed, rich metadata.

How to obtain it?

We will use CocoPharmaceuticals database as resource sample. As a result the end user will be able to see a business view on top of the underlying resource and similarly the technical view.
By assigning a glossary term to a relational column, we will trigger the creation of two views on top of the parent table:

  • business view with all the columns having business terms associated and using the business term name as column name
  • technical view with all the columns having business terms associated and using the technical name as column name

Solution Overview Components

  1. Oracle database containing Coco Pharmaceuticals sample data. We will use table EMPSALARYANALYSIS as an example. Link: https://github.com/odpi/egeria/tree/master/open-metadata-resources/open-metadata-deployment/sample-data/coco- pharmaceuticals
  2. IBM Governance Catalog (IGC) as the repository containing glossary terms and the metadata for the above database. This repository is integrated into OpenMetadata world using the proxy
    pattern: https://github.com/odpi/egeria/blob/master/open-metadata-publication/website/open-metadata-integration-patterns/adapter-integration-pattern.md. The proxy is used to translate from proprietary formats, events, APIs to Open Metadata standards (and the other way around). Link: https://github.com/odpi/egeria-connector-ibm-information-server
  3. Kafka as event bus
  4. Open Metadata server configured to have Open Metadata Access Services (OMAS) enabled.
    Open Metadata Access Services are sets of APIS targeting certain type of consumers and use cases. For more general details check link: https://github.com/odpi/egeria/blob/master/open-metadata-implementation/access-services/README.md
    In current example we will use two OMASs:
    1. Information View OMAS is responsible for integrating with virtualization and BI tools. For this flow we are using it because we need the structure and context (location) of the table.
    2. Data Platform OMAS is responsible for integration with tools that want to create new data assets. In this case its responsibility is to integrate with the data virtualization tool used in this setup, gaianDB, and model the views created as new data assets.
  5. Atlas repository as a repository integrated with Open Metadata natively, since it is implementing the OpenMetadata APIs, protocols and types. Using Atlas built-in UI we will explore the entities created for representing the views.
    More details about native pattern can be found here: https://github.com/odpi/egeria/blob/master/open-metadata-publication/website/open-metadata-integration-patterns/native-integration-pattern.md
  6. Virtualizer: component designed to integrate with data virtualization tool, therefore responsible for running the actual queries and statements for creating the views. It also produces events representing the views structure, along with details about the host and platform creating the views (gaianDb in this case)
    Link: https://github.com/odpi/egeria/blob/master/open-metadata-implementation/governance-servers/virtualization-services/README.md
  7. Gaian DB as data virtualization tool. Gaian DB is a leightweight federated database over multiple sources, as a result providing a single centralized view over multiple, heterogeneous back-end data sources. More details can be found here: https://github.com/gaiandb/gaiandb.
    Our setup includes a front-end Gaian node to which virtualizer will connect and also another back-end Gaian node connected to the Oracle database. Views will be created in the front-end Gaian Node and will also be linked to the real database tables from back-end Gaian node.

All OpenMetadata servers (IGC proxy, OMAS server and Atlas) are configured to be part of the same cohort, therefore they will receive and publish events on the same cohort topic.

Configuring the environment

  1. Setting up the cohort
    At the centre of the setup is the concept of cohort. A cohort is a collection of metadata repositories sharing metadata using the Open Metadata Repository Services (OMRS).
    Below is a picture describing the cohort defined for this setup. We’ve configured the three members as part of the same cohort. This implies that they listen and publish OMRS events to the same cohort topic and therefore they have in their local registry store registration details about all the other members in the cohort. As a result, this enables the repositories to act as one single virtual repository.             
  2. Virtualization layer: Virtualizer, data virtualization tool and underlying database. Virtualizer is the component designed to interact with GaianDB. Hence its responsibility is to connect to front-end node (Gaian Node 2) and create the views in this node. The data for these views is located in the Oracle database, but available through backend node( Gaian Node 1) to the other connected nodes (Gaian Node 2 in this scenario).

Flow

  1. First step triggering the creation of views: business user assigns business term to column in IGC
  2. As a result, a new event is published on InfosphereEvent (internal IGC topic)
  3. IGC Event is consumed by the proxy and translated to OMRS event and types. A new OMRS event containing details about the business term and column is published on the cohort topic, thus reaching all the members of the cohort.

The event will include:

  • Details about the type of event. In this case the event is a NEW_RELATIONSHIP_EVENT with type SemanticAssignment
  • Mandatory properties for the 2 entities between which the relationship is created. These are necessary to be able to identify the entities uniquely in Open Metadata(OM) world. This includes the type of each entity (RelationalColumn and GlossaryTerm), the guid of each entity as unique identifiers in the repository and also qualifiedName as unique identifier at entity type level.
  • Other details such as event provenance, originator, auditing and versioning info are also defined in the event.

Link: https://github.com/odpi/egeria/blob/master/open-metadata-publication/website/java-events/new-semantic-assignment-OMRS-Event.json

4. The event is picked up by both Atlas and OMAS server because they are cohort members and not originators of the initial event. As a result of processing this event, Atlas cohort member will create a SEMANTIC_ASSIGNMENT relationship between the entities representing the column and the glossary term.

The OMASs server will also pick up the event. Because it is configured to have access-services enabled, all enabled access services listeners will receive this event and either process it or discard it, based on the logic and use cases for it. Starting from column guid as unique identifier in OM world, Information View OMAS will retrieve all entities describing the table, thus building the full context. This includes host details, connector type, database name, schema name, table name, columns and business terms linked to the columns, columns constraints such as primary key and foreign key.

This event containing the full context is published to Information View Out topic as input for virtualizer component.

Link: https://github.com/odpi/egeria/blob/master/open-metadata-publication/website/java-events/table-context-EMPSALANALYSIS.json

5. Virtualizer is processing the event and creates the 2 views in front-end node (Gaian Node 2). In the screenshot below is the business view created in gaianDB:

As seen in the picture, the view includes the columns that have business terms assigned. Column names are business term names because a business user is not interested in technical name (like FNAME). As a result, what is relevant is the actual meaning: the business term name First Name.

6. Virtualizer is publishing to DataPlatform IN topic the events describing the views.

Sample event can be found here: https://github.com/odpi/egeria/blob/master/open-metadata-publication/website/java-events/information-view-EMPSALANALYSIS.json

Please note that this event contains also details about the host and platform where the view was created, along with the view columns, business terms associated to these column and underlying database columns.

7. DataPlatform OMAS consumes the events from DataPlatform IN topic

As a result, the DataPlatform OMAS will issue calls to enterpriseConnector for creating the entities and relationships modelling the views. The enterpriseConnector is responsible to trigger a federated request by calling the connectors stored in server’s registry store. Therefore it is creating the entities and relationships modelling the view.

Because the current integration for IGC doesn’t support creation of entities/relationships, the entities and relationships defining the view are created only in Atlas. The business and technical views are represented in Open Metadata as RelationalTable entities and all the columns have a relationship to the business term and also to the real database column. The data virtualization tool (gaianDB) and database asset (Gaian database) are also modelled by the entities SoftwareServer, Endpoint, Connection, ConnectorType and Database.

In picture below, all view columns are linked to RelationalTableType

Screenshot below displays the relationship meaning (or semantic assignment) between the column and glossary term

Similarly, picture below displays the connection between the view column and the actual database source column through relationship queryTarget

OpenDS4All

Data science for all – an open source approach to education

By Blog, ODPi OpenDS4All

Editor’s Note: This blog post from Ana Echeverri is reposted from the IBM Global Data Science Forum blog

Today is an exciting day for me. After months of hard work, IBM, the University of Pennsylvania, and the Linux Foundation are announcing an innovative, first-of-a-kind open source project that will enable universities around the world to build Data Science programs faster.

With IBM’s investment and industry expertise, University of Pennsylvania’s long-standing academic leadership and the Linux Foundation as a premier open source consortium, we are creating a curriculum kit comprised of a set of open source building blocks for teaching the core concepts of data science in undergraduate and graduate programs. These building blocks are based on Python and open source tools and frameworks, and include slides, documentation, code, and data sets that could be adopted or updated by anyone.

This idea of open source Data Science education is personal to me. Access to education changed my life.  Coming from a small town in Colombia, South America, education gave me the opportunity to work with cutting edge Data Science and AI technologies at one of the best companies in the world (IBM).  I believe this project will provide a foundation of building blocks for schools to supplement, strengthen and start up their data science programs. And most importantly, because this is open source, it enables any institution on earth thus providing more opportunities for learners to  participate in the AI Economy like I did.

When I first started this project, I met with universities in different regions of the world and a common theme emerged: starting a Data Science program from scratch is incredibly difficult, and universities need educational materials to accelerate their efforts. This was not only encouraging but validated the need: there is a demand worldwide and this concept of open source education could reach across oceans and to our local community colleges.

By making a “starter set” of training materials available and providing guidance on how to build a Data Science program, IBM and cross-industry partners and educators working together can help accelerate the availability of skills building programs around the world.

It is the beginning of a new era for Data Science Education.

The project is in incubation currently as IBM and UPenn create the initial set of materials to contribute.  The project will officially launch in early 2020. To get early insights and stay up to date with this project please register here.

Getting started with Egeria notebooks using docker

By ODPi Egeria

Do you like understanding a new technology hands-on yet also want to understand the concepts? Concerned it will take too long to get started?

Wait no longer! You can now experiment with Egeria by making use of our new Jupyter notebooks installed via Docker. Within minutes (plus download time) you’ll be happily running REST API calls against a live Egeria environment, and gaining an understand of Egeria’s concepts.

In this first Blog post I’ll take you through getting set up with a lab environment and running your first notebook.

Prerequisites

Before we get started on setting up Egeria, you’ll need access to a few things:

  • docker – the environment in which to run Egeria
  • git – the source code control tool to get files needed

Setting up docker

Docker makes it easy to run pre-created environments in ‘containers’ which are isolated from the host machine such as your laptop. The instructions here were tested with ‘Docker for Mac’, but you can also use ‘Docker for Windows’, or docker installed on linux.

Note: The containers are linux containers built for Intel 64 bit architecture, so they won’t work on ARM, nor will they work in Windows containers …

Once you’ve installed docker, make sure it’s running as covered in the docs above. If using windows or mac, you should see a docker icon (a whale) on the toolbar.

Setting up git

git is the tool we use to manage our code. If you don’t have it installed, install it from the git website (easiest), or else from your linux distribution or homebrew . No special configuration is needed.

Retrieving the Egeria code

You’re now ready to retrieve the Egeria code. Whilst we only need a few files for the docker work this will be useful for further exercises and following along with other blog posts.

Open up a command window (mac, windows or linux), switch to a suitable directory and type:

git clone https://github.com/odpi/egeria 

This will pull down the egeria code locally to your machine.

Running the notebooks

We’re now ready to run the notebook. To do this we will use a feature of docker called ‘docker-compose’. This is a simple approach to running multiple containers (think of these as applications or services) together.

For this example we are running

To get started with the docker compose environment (all one line – and replace / with \ for Windows):


cd egeria/open-metadata-resources/open-metadata-deployment/compose/tutorials
docker-compose -f egeria-tutorial.yaml up

At this point you’ll notice a lot of activity. Once it has settled down go to a web browser and go to http://localhost:18888 . You should see a Jupyter notebook environment open, and a list of our current labs will be shown in the left hand folder tree

If you don’t see the UI appear, press CTRL-C, and retry the docker compose command. Sometimes a slower network download can cause things not to start properly first time.

Running the notebooks

In the Jupyter UI navigate to ‘administration’ and open up the `read-me-first` notebook. This introduces you to how to setup an Egeria environment in a fictional company ‘Coco Pharmaceuticals’.

The large blue bar is effectively a cursor. It shows where you are in the notebook. Read each paragraph in turn and then hit the ‘play’ button to progress through the notebook. You can also press SHIFT-ENTER to run the current step and move to the next one.  As well as text, some paragraphs contain code which are being executed live against a real egeria server in your docker environment.

Once you’ve worked through this notebook try ‘managing-servers’ which goes into more specifics of how to start and stop servers. Other tutorials get into topics such as accessing assets.

Shutting down the environment

docker-compose -f egeria-tutorial.yaml down

Updating the environment

Each time the environment is started the same code will be run, since the container is downloaded the first time it’s used. 

In order to refresh the contains and run the latest code (recommended) run:

docker-compose -f egeria-tutorial.yaml pull

Further information

If you have any problems running the notebooks:

These containers we used above can be used in other ways too – stay tuned to the blog to find out more.

How Do I Teach My Second Grade Kid What AI Is?

By Blog, ODPi BI and AI

By Cupid Chan, CTO, Index Analytics

I recently took my kids to Hersey’s Park in Pennsylvania. In case you haven’t heard about it, it’s just a normal attraction park with rides, and long lines. As we were waiting in line, my son asked, “Dad, what are you doing at work?”

I said, “I help my clients to define KPIs, and then try to apply Naive Bayes to predict the outcome. If the result is not good, we may need to build a neural network, and test it again.”

Do you really think that’s the answer I gave my son? 

OF COURSE NOT!

Not because what I said is wrong, but he is simply not the right audience for that type of response. More importantly, I don’t want him to think “My dad is crazy and I’d better not ask him anything again.”  So, I need to come up with an answer in a language that he can understand. 

If a computer can do work but no one knows whether it’s you doing the work or the computer, that’s AI.” – a basic principle of AI proposed by Alan Turing.

“Great! I can then use AI to do my homework and my teacher would not know that it’s not me doing that!”

Supervised Learning 

“Hmm… Do you remember how you taught your younger sister the difference between a pen and an apple? You hold up a pen in front of her so she can see it and say, ‘pen.’ And you hold up an apple so she can see it and say, ‘apple.’ And you repeat this. Sooner or later, you expect her to understand the long pointy thing is a pen. And the red, round thing is an apple.”

Long, pointed, round, red. These are Features in Machine Learning. And “Pen” or “Apple” are Labels. Combined, this is Supervised Learning. This is one way how a computer can understand that different Features are associated with different Labels in Supervised Learning. 

“Dad, I remember I saw a guy teaching people this on YouTube, too!”

PIKOTARO – PPAP (Pen Pineapple Apple Pen) (Long Version) [Official Video]

Well, the song is funny but it is not related to Supervised Learning. But if it inputs the concept of Supervised Learning for a child, why not let it be?

In the real world, Supervised Learning can help in many different ways. One of them is distinguishing between a cancer cell from a normal cell. In this case, the computer is the “child” and the doctor is the “parent.” By showing examples repeatedly, the doctor trains the computer to distinguish the patterns between a normal cell and a cancer cell.

Unsupervised Learning

You may have heard about the Law of Entropy, or the Second Law of Thermodynamics. In general, unless you put in energy to keep the situation in that current state, the whole condition will just become messier over time.

You can apply the very same law to a kid’s playground. Unless you really put in effort to keep toys tidy, the toys will not automatically go back to their original positions. At my home, my mother-in-law helps out the kids to keep the play areas organized. Once, when she went to Hong Kong for a vacation, the play areas became more disorganized day after day. Finally, my wife had to step in and demand that the kids clean up before grandmother returned. She did not give exact instructions. She just demanded they clean up!

Guess what happened in the next few hours? The kids put all the four-wheels-boxy-shaped things in one area, and we called it “Cars.” And all the fluffy stuff was put together in another area, and we called it “Stuffed Animals.” And then they put all the blocks that can be stacked up together in some boxes and named “Legos.”

They did not get any specific instructions or rules to decide what should go where. But somehow they figured out the similarities and differences. In Machine Learning, this is called Unsupervised Learning.

This is when the computer is given a lot of data points and the computer figures out the pattern by itself. In the real world, Unsupervised Learning can be used in customer segmentation. There is a lot of information and data about a lot of customers. You don’t tell the computer who should be grouped with whom, but this is figured out by Unsupervised Learning. Traditionally, this is done by the expert who observes different patterns, like age, spending pattern, where you live, salary… and then tries to group the types of customers together. And now, we have the machine to play the role of expert, which is able to scan through millions of records in a few seconds but is impossible for any human being

Reinforcement Learning

When dealing with kids, it’s not always the best way to just keep telling them and keep showing them the proper examples. At the same time, it’s not very effective to give no instructions and let them figure out everything by themselves. 

It’s a common practice in teaching kids to reward them when they do something good. And when they do something bad, you punish them. This is intended to reinforce certain behaviors. In Machine Learning, this is known as Reinforcement Learning.

When a computer performs the way that you want, you add a point. When it fails to do what you want, you reduce a point. The computer therefore knows what to do to gain points. 

In the real world, Reinforcement Learning is applied heavily in Robotics. For example, a robot is trying to walk a straight line. It may make it or it may fall down. Whenever the robot falls down, you reduce a point. And whenever the robot successfully makes one step, you add one point. There are many motors and sensors on a robot, and all of them are collecting data for the system. The robot learns what kind of motor speed, what kind of angle is needed in order to keep walking in a straight line and avoid falling.

2 Types of Measurement

2 Popular Questions by Kids – Key Approaches in Machine Learning

Kids like to ask a strangers, “How old are you?” and “Are you a boy or a girl?”

“How old are you?” is asking for a number. It’s Regression.

“Are you a boy or a girl?” is Classification. Looking for an outcome for a pre-defined category. Both are 2 important concepts in Machine Learning.

3 Ways to Learn

Kids observe the world around them. They come up with certain rules. They will propose the result, and they will be corrected by adults. Which makes the rule to get better and better.

Compared to the old way of programming: Developer observes the world. They code rules using rule-based algorithms. And they will come up with some results. Based on this, they will change or modify the rules. 

In AI, it’s a little bit different. Developer creates the AI algorithm and have it create the rule. The algorithm comes up with a model and continue to train it. The model then tries to predict the result and see if it is accurate or not. The key here is that the algorithm keeps modifying the model using more data without the developer being involved. 

That’s the beauty of AI!

No Right or Wrong. Just Right or Left! 

Final question: What are the similarities and differences between Tesla and Uber? They both are both in the automobile industry. But one company, Tesla, creates new technology to help revolutionize the whole car industry. While Uber uses existing technology (like mapping, mobile app..etc) to create a new business model.

So the power of AI is not just in making algorithms. It can be using existing algorithms to build new ways of doing business. One builds the technology, one utilizes it.

Remember my son who was thinking about ways to get his homework done? Ultimately, I would be equally proud if he came up with an algorithm that could do his homework and successfully fool his teacher or if he utilized existing algorithms to do the same thing. Both are important new ways of adopting AI to solve problems. 

There is no Right or Wrong, only Right or Left. But no matter which direction you pick, be persistent and you will cross the finish line of success via either route – Cupid Chan tweet on Nov 28, 2018

The content of this blog has been presented in a few national and international conferences such as Open Source Summit in Shanghai China and MicroStrategy Federal Summit in Washington DC. I also captured this in my very first YouTube channel video which you can find here: https://www.youtube.com/watch?v=dh9xz4SBukE&t=13s 

Twitter: @cupidckchan

Linkedin: www.linkedin.com/in/cupidchan/ 


Implementing an Open Metadata Connector

By Blog, ODPi Egeria, Tech Deep Dive

Eager to integrate your own metadata repository into the Egeria ecosystem, but not sure where to start? This article walks through how to do just that: implementing an open metadata repository connector according to the standards of ODPi Egeria.

The following outlines the steps involved:

Introduction

Integrating a metadata repository into the Open Metadata ecosystem involves coding an Open Metadata Collection Store Connector. These are Open Connector Framework (OCF) connectors that define how to connect to and interact with a metadata repository.

Open Metadata Collection Store Connectors are typically comprised of two parts:

  1. The repository connector: which provides a standard repository interface that communicates using the Open Metadata Repository Services (OMRS) API and payloads.
  2. The event mapper connector: which captures events when metadata has changed in the metadata repository and passes these along to the Open Metadata Repository Services (OMRS) cohort.

The event mapper connector often calls the repository connector: to translate the repository-native events into Egeria’s OMRS events.

While various patterns can be used to implement these, perhaps the simplest and most loosely-coupled is the adapter. The adapter approach wraps the proprietary interface(s) of the metadata repository to translate these into OMRS calls and payloads. In this way, the metadata repository can communicate as if it were an open metadata repository.

The remainder of this article will walkthrough:

  • implementing such an adapter pattern as a connector, and
  • using the resulting connector through the proxy capabilities provided by the core of Egeria.

1. Design work

Designing before implementing
Designing before implementing

Before delving straight into the implementation of a connector, you really need to start with a level of design work. Fundamentally this will involve two steps:

  1. Mapping to the meta-model concepts of Egeria: in particular Entities, Classifications and Relationships.
  2. Mapping to the actual open metadata types of Egeria: e.g. GlossaryTerm, GlossaryCategory, RelationalColumn, and so on.

Map to the Egeria meta-model concepts

The best place to start with the design work is to understand the meta-model of Egeria itself. Consider how your metadata repository will map to the fundamental Egeria metadata concepts: Entities, Classifications, and Relationships.

When implementing the code described in the remainder of this article, you’ll be making use of and mapping to these fundamental Egeria concepts. Therefore, it is well worth your time now understanding them in some detail. This is before even considering specific instances of these types like GlossaryTerm or GlossaryCategory.

Meta-model mapping may be quite a straightforward conceptual mapping for some repositories. For example, Apache Atlas has the same concepts of Entities, Classifications and Relationships all as first-class objects.

On the other hand, not all repositories do. For example, IBM Information Governance Catalog (IGC) has Entities, and a level of Relationships and Classifications — but the latter two are not really first-class objects (i.e. properties and values cannot exist on them).

Therefore you may need to consider

  • whether to attempt to support these constructs in your mappings, and
  • if so, how to prescriptively represent them (if they are not first-class objects).

For example, in the implementation of the sample IGC connector we suggest using categories with specific names in IGC to represent certain classifications. Additionally, one of the reasons for implementing a read-only connector is that we can still handle relationships without any properties: by simply having the properties of any Egeria relationships we translate from IGC all be empty.

Map to the Egeria open metadata types

Once you have some idea for how to handle the mapping to the meta-model concepts, check your thinking by working through a few examples. Pick a few of the open metadata types and work out on paper how they map to your metadata repository’s pre-existing model. Common areas to do this would be e.g. GlossaryTerm, GlossaryCategory for glossary (business vocabulary) content; RelationalColumn, etc for relational database structures; and so on.

Most of these should be fairly straightforward after you have an approach for mapping to the fundamental meta-model concepts.

Then you’ll also want to decide how to handle any differences in types between the open metadata types and your repository’s pre-existing types:

  • Can your metadata repository be extended with new types?
  • Can your metadata repository’s pre-existing types be extended with new properties?
  • What impacts might be caused to repositories (and metadata instances) that already exist if you add to or extend the types?
  • What impacts will this have on your UI or how users interact with these extensions?

Your answers to these questions will inevitably depend on your specific metadata repository, but should help you decide on what approach you’d like to take:

  • Ignore any open metadata types that do not map to your pre-existing types.
  • Add any Egeria open metadata types that do not exist in your repository.
  • Add Egeria open metadata properties to your pre-existing types when Egeria has additional properties that do not yet exist in your type(s).
  • Implement a read-only connection (possibly with some hard-coding of property values) for types that are partially map-able, but not easily extended to support the full set of properties defined on the open metadata type.
  • and so on.

2. Pre-requisites

Creating your own connector project
Creating your own connector project

Implementing an adapter can be greatly accelerated by using the pre-built base classes of Egeria. Therefore building a connector using Java is likely the easiest way to start.

This requires an appropriate build environment comprised of both Java (minimally v1.8) and Maven.

Setup a project

Egeria has been designed to allow connectors to be developed in projects independently from the core itself. Some examples have already been implemented, which could provide a useful reference point as you proceed through this walkthrough:

Start by defining a new Maven project in your IDE of choice. In the root-level POM be sure to include the following:

<properties>
    <open-metadata.version>1.1-SNAPSHOT</open-metadata.version>
</properties>
<dependencies>
    <dependency>
        <groupId>org.odpi.egeria</groupId>
        <artifactId>repository-services-apis</artifactId>
        <version>${open-metadata.version}</version>
        <scope>compile</scope>
    </dependency>
    <dependency>
        <groupId>org.odpi.egeria</groupId>
        <artifactId>open-connector-framework</artifactId>
        <version>${open-metadata.version}</version>
        <scope>compile</scope>
    </dependency>
</dependencies>

Naturally change the version to whichever version of Egeria you’d like to build against. The dependencies listed ensure you’ll have the necessary portion of Egeria to build your connector against.

3. Implement the repository connector

Implementing your connector in your own project
Implementing your connector in your own project

The repository connector exposes the ability to search, query, create, update and delete metadata in an existing metadata repository. As such, it will be the core of your adapter.

You can start to build this within your new project by creating a new Maven module called something like adapter. Within this adapter module implement the following:

Implement an OMRSRepositoryConnectorProvider

Start by writing an OMRSRepositoryConnectorProvider specific to your connector, which extends OMRSRepositoryConnectorProviderBase. The connector provider is a factory for its corresponding connector. Much of the logic needed is coded in the base class, and therefore your implementation really only involves defining the connector class and setting this in the constructor.

For example, the following illustrates this for the Apache Atlas Repository Connector:

package org.odpi.egeria.connectors.apache.atlas.repositoryconnector;

import org.odpi.openmetadata.frameworks.connectors.properties.beans.ConnectorType;
import org.odpi.openmetadata.repositoryservices.connectors.stores.metadatacollectionstore.repositoryconnector.OMRSRepositoryConnectorProviderBase;

public class ApacheAtlasOMRSRepositoryConnectorProvider extends OMRSRepositoryConnectorProviderBase {

    static final String  connectorTypeGUID = "7b200ca2-655b-4106-917b-abddf2ec3aa4";
    static final String  connectorTypeName = "OMRS Apache Atlas Repository Connector";
    static final String  connectorTypeDescription = "OMRS Apache Atlas Repository Connector that processes events from the Apache Atlas repository store.";

    public ApacheAtlasOMRSRepositoryConnectorProvider() {
        Class connectorClass = ApacheAtlasOMRSRepositoryConnector.class;
        super.setConnectorClassName(connectorClass.getName());
        ConnectorType connectorType = new ConnectorType();
        connectorType.setType(ConnectorType.getConnectorTypeType());
        connectorType.setGUID(connectorTypeGUID);
        connectorType.setQualifiedName(connectorTypeName);
        connectorType.setDisplayName(connectorTypeName);
        connectorType.setDescription(connectorTypeDescription);
        connectorType.setConnectorProviderClassName(this.getClass().getName());
        super.connectorTypeBean = connectorType;
    }
}

Note that you’ll need to define a unique GUID for the connector type, and a meaningful name and description. Really all you then need to implement is the constructor, which can largely be a copy / paste for most adapters. Just remember to change the connectorClass to your own, which you’ll implement in the next step (below).

Implement an OMRSRepositoryConnector

Next, write an OMRSRepositoryConnector specific to your connector, which extends OMRSRepositoryConnector. This defines the logic to connect to and disconnect from your metadata repository. As such the main logic of this class will be implemented by:

  • Overriding the initialize() method to define any logic for initializing the connection: for example, connecting to an underlying database, starting a REST API session, etc.
  • Overriding the setMetadataCollectionId() method to create an OMRSMetadataCollection for your repository (see next step below).
  • Overriding the disconnect() method to properly cleanup / close such resources.

Whenever possible, it makes sense to try to re-use any existing client library that might exist for your repository. For example, Apache Atlas provides a client through Maven that we can use directly. Re-using it saves us from needing to implement and maintain various beans for the (de)serialization of REST API calls.

The following illustrates the start of such an implementation for the Apache Atlas Repository Connector:

package org.odpi.egeria.connectors.apache.atlas.repositoryconnector;

import org.apache.atlas.AtlasClientV2;

public class ApacheAtlasOMRSRepositoryConnector extends OMRSRepositoryConnector {

    private String url;
	private AtlasClientV2 atlasClient;
    private boolean successfulInit = false;

    public ApacheAtlasOMRSRepositoryConnector() { }

    @Override
    public void initialize(String               connectorInstanceId,
                           ConnectionProperties connectionProperties) {
        super.initialize(connectorInstanceId, connectionProperties);

        final String methodName = "initialize";

        // Retrieve connection details
        EndpointProperties endpointProperties = connectionProperties.getEndpoint();
        // ... check for null and handle ...
        this.url = endpointProperties.getProtocol() + "://" + endpointProperties.getAddress();
        String username = connectionProperties.getUserId();
        String password = connectionProperties.getClearPassword();

        this.atlasClient = new AtlasClientV2(new String[]{ this.url }, new String[]{ username, password });

        // Test REST API connection by attempting to retrieve types list
        try {
            AtlasTypesDef atlasTypes = atlasClient.getAllTypeDefs(new SearchFilter());
            successfulInit = (atlasTypes != null && atlasTypes.hasEntityDef("Referenceable"));
        } catch (AtlasServiceException e) {
            log.error("Unable to retrieve types from Apache Atlas.", e);
        }

        if (!successfulInit) {
            ApacheAtlasOMRSErrorCode errorCode = ApacheAtlasOMRSErrorCode.REST_CLIENT_FAILURE;
            String errorMessage = errorCode.getErrorMessageId() + errorCode.getFormattedErrorMessage(this.url);
            throw new OMRSRuntimeException(
                    errorCode.getHTTPErrorCode(),
                    this.getClass().getName(),
                    methodName,
                    errorMessage,
                    errorCode.getSystemAction(),
                    errorCode.getUserAction()
            );
        }

    }

    @Override
    public void setMetadataCollectionId(String metadataCollectionId) {
        this.metadataCollectionId = metadataCollectionId;
        if (successfulInit) {
            metadataCollection = new ApacheAtlasOMRSMetadataCollection(this,
                    serverName,
                    repositoryHelper,
                    repositoryValidator,
                    metadataCollectionId);
        }
    }

}

This has been abbreviated from the actual class for simplicity; however, note that as part of the initialize() it may make sense to test out the parameters received for configuring the connection, to make sure that a connection to your repository can actually be established before proceeding any further.

(This is also used in this example to setup a flag successfulInit to indicate whether connectivity was possible, so that if it was not we do not proceed any further with setting up the metadata collection and we allow the connector to fail immediately, with a meaningful error.)

You may want to wrap the metadata repository client’s methods with your own methods in this class as well. Generally think of this class as “speaking the language” of your proprietary metadata repository, while the next class “speaks” Egeria.

Implement an OMRSMetadataCollection

Finally, write an OMRSMetadataCollection specific to your repository, which extends OMRSMetadataCollectionBase. This can grow to be quite a large class, with many methods, but is essential for the participation of your metadata repository in a broader cohort. In particular, it is heavily leveraged by Egeria’s Enterprise Connector to federate actions against your metadata repository. As such, this is how your connector “speaks” Egeria (open metadata).

Ideally your implementation should override each of the methods defined in the base class. To get started:

  1. Override the addTypeDef() method. For each TypeDef this method should either extend your metadata repository to include this TypeDef, configure the mapping from your repository’s types to the open metadata types, or throw a TypeDefNotSupportedException. (For those that are implemented it may be helpful to store these in a class member for comparison in the next step.)
  2. Override the verifyTypeDef() method, which can check the types you have implemented (above) conform to the open metadata TypeDef received (ie. that all properties are available, of the same data type, etc), and that if none have yet been listed as implemented that false is returned (this will cause addTypeDef() above to automatically be called).
  3. Override the getEntityDetail() method that retrieves an entity by its GUID.

Note that there are various options for implementing each of these. Which route to take will depend on the particulars of your specific metadata repository:

  • In the sample IBM InfoSphere Information Governance Catalog Repository Connector the mappings are defined in code. This approach was used because IGC does not have first-class Relationship or Classification objects. Therefore, some complex logic is needed in places to achieve an appropriate mapping. Furthermore, if a user wants to extend the logic or mappings used for their particular implementation of IGC, this approach allows complete flexibility to do so. (A developer simply needs to override the appropriate method(s) with custom logic.)
  • The sample Apache Atlas Repository Connector illustrates a different approach. Because the TypeDefs are quite similar to those of Egeria, it is easier to map more directly through configuration files. A generic set of classes can be implemented that use these configuration files to drive the specifics of each mapping. In this case, simple JSON files were used to define the omrs name of a particular object or property and the corresponding atlas entity / property name to which it should be mapped. While this allows for much more quickly adding new mappings for new object types, it is far less flexible than the code-based approach used for IGC. (It is only capable of handling very simple mappings: anything complex would either require the definition of a complicated configuration file or still resorting to code).

Once these minimal starting points are implemented, you should be able to configure the OMAG Server Platform as a proxy to your repository connector by following the instructions in the next step.

Important: this will not necessarily be the end-state pattern you intend to use for your repository connector. Nonetheless, it can provide a quick way to start testing its functionality.

This very basic, initial scaffold of an implementation allows:

  • a connection to be instantiated to your repository, and
  • translation between your repository’s representation of metadata and the open metadata standard types.

4. Package your connector

Packaging your connector
Packaging your connector

To make your connector available to run within the OMAG Server Platform, you can package it into a distributable .jar file using another Maven module, something like distribution.

In this module’s POM file include your adapter module (by artifactId) as a dependency, and consider using the maven-shade-plugin to define just the necessary components for your .jar file. Since it should only ever be executed as part of an Egeria OMAG Server Platform, your .jar file does not need to re-include all of the underlying Egeria dependencies.

For example, in our Apache Atlas Repository Connector we only need to include the adapter module itself and the base dependencies for Apache Atlas’s Java client (all other dependencies like Egeria core itself, the Spring framework, etc will already be available through the Egeria OMAG Server Platform):

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>egeria-connector-apache-atlas</artifactId>
        <groupId>org.odpi.egeria</groupId>
        <version>1.1-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>egeria-connector-apache-atlas-package</artifactId>

    <dependencies>
        <dependency>
            <groupId>org.odpi.egeria</groupId>
            <artifactId>egeria-connector-apache-atlas-adapter</artifactId>
            <version>${open-metadata.version}</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>${maven-shade.version}</version>
                <executions>
                    <execution>
                        <id>assemble</id>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <artifactSet>
                                <includes>
                                    <include>org.odpi.egeria:egeria-connector-apache-atlas-adapter</include>
                                    <include>org.apache.atlas:atlas-client-common</include>
                                    <include>org.apache.atlas:atlas-client-v1</include>
                                    <include>org.apache.atlas:atlas-client-v2</include>
                                    <include>org.apache.atlas:atlas-intg</include>
                                    <include>org.apache.hadoop:hadoop-auth</include>
                                    <include>org.apache.hadoop:hadoop-common</include>
                                    <include>com.fasterxml.jackson.jaxrs:jackson-jaxrs-base</include>
                                    <include>com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider</include>
                                    <include>com.fasterxml.jackson.module:jackson-module-jaxb-annotations</include>
                                    <include>com.sun.jersey:jersey-client</include>
                                    <include>com.sun.jersey:jersey-core</include>
                                    <include>com.sun.jersey:jersey-json</include>
                                    <include>com.sun.jersey.contribs:jersey-multipart</include>
                                    <include>javax.ws.rs:jsr311-api</include>
                                </includes>
                            </artifactSet>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

Of course, you do not need to use the maven-shade-plugin to accomplish such bundling: feel free to define a Maven assembly or other Maven techniques.

Building and packaging your connector should then be as simple as running the following from the root of your project tree:

$ mvn clean install

Working out exactly which dependencies to include when you are using an external client like Apache Atlas’s can be a little bit tricky. Starting small will inevitably result in various errors related to classes not being found: when building you’ll see a list of all the classes considered by the shade plugin and which are included and excluded. You can use this to make some educated guesses as to which may still need to be included if you are running into errors about classes not being found. (Ideally you’ll have a simple, single jar file / dependency you can directly include instead of needing to work through this, but that won’t always be the case.)

Again, since we will just be using this connector alongside the existing OMAG Server Platform, this avoids ending up with a .jar file that includes the entirety of the Egeria OMAG Server Platform (and its dependencies) in your connector .jar — and instead allows your minimal .jar to be loaded at startup of the core OMAG Server Platform and configured through the REST calls covered in section 6.

Of course, if you intend to embed or otherwise implement your own server, the packaging mechanism will likely be different. However, as mentioned in the previous step this should provide a quick and easy initial way of testing the functionality of the connector against the core of Egeria.

5. Startup the OMAG Server Platform with your connector

Configuring the OMAG Server Platform with your connector
Configuring the OMAG Server Platform with your connector

Assuming you’ve built your connector .jar file using the approach outlined above, you’ll now have a .jar file under the distribution/target/ directory of your project: for the Apache Atlas example, this would be distribution/target/egeria-connector-apache-atlas-package-1.1-SNAPSHOT.jar.

When starting up the OMAG Server Platform of Egeria, we need to point to this .jar file using either the LOADER_PATH environment variable or a -Dloader.path= command-line argument to the server start command:

$ export LOADER_PATH=..../distribution/target/egeria-connector-apache-atlas-package-1.1-SNAPSHOT.jar
$ java -jar server-chassis-spring-1.1-SNAPSHOT.jar

or

$ java -Dloader.path=..../distribution/target/egeria-connector-apache-atlas-package-1.1-SNAPSHOT.jar -jar server-chassis-spring-1.1-SNAPSHOT.jar

Either startup should ensure your connector is now available to the OMAG Server Platform to use for connecting to your metadata repository. You may also want to setup the LOGGING_LEVEL_ROOT environment variable to define a more granular logging level for your initial testing, e.g. export LOGGING_LEVEL_ROOT=INFO before running the startup command above, to receive deeper information on the startup. (You can also setup a similar variable to get even deeper information for just your portion of code by using your unique package name, e.g. export LOGGING_LEVEL_ORG_ODPI_EGERIA_CONNECTOR_X_Y_Z=DEBUG.)

Then configure the OMAG Server Platform to use your connector. Note that the configuration and startup sequence is important.

Start with just the following:

Enable the OMAG Server as a repository proxy

Enable the OMAG Server as a repository proxy by specifying your canonical OMRSRepositoryConnectorProvider class name for the connectorProvider={javaClassName} parameter and POSTing to:

http://egeriahost:8080/open-metadata/admin-services/users/myself/servers/test/local-repository/mode/repository-proxy/connection

For example, in our Apache Atlas example we would POST with a payload like the following:

{
  "class": "Connection",
  "connectorType": {
    "class": "ConnectorType",
    "connectorProviderClassName": "org.odpi.egeria.connectors.apache.atlas.repositoryconnector.ApacheAtlasOMRSRepositoryConnectorProvider"
  },
  "endpoint": {
    "class": "Endpoint",
    "address": "{{atlas_host}}:{{atlas_port}}",
    "protocol": "http"
  },
  "userId": "{{atlas_user}}",
  "clearPassword": "{{atlas_password}}"
}

 

Start the server instance

Start the OMAG Server instance by POSTing to:

http://egeriahost:8080/open-metadata/admin-services/users/myself/servers/test/instance

During server startup you should then see various messages related to the metadata type registration process as the open metadata types are checked against your repository. (These in turn call the methods you’ve implemented in your OMRSMetadataCollection.) You might naturally need to iron out a few bugs in those methods before proceeding further…

6. Test your connector’s basic operations

Testing your connector's basic operations via API
Testing your connector’s basic operations via API

Each time you change your connector code, you’ll naturally want to re-build it (mvn clean install) and restart the OMAG Server Platform. If you are not changing any of the configuration, you can simply restart the OMAG Server Platform and re-run the POST to start the server instance (the last step above). If you need to change something in the configuration itself, it will be best to:

  1. Stop the OMAG Server Platform.
  2. Delete the configuration document (a file named something like
    omag.server.test.config).
  3. Start the OMAG Server Platform again.
  4. Re-run both steps above (enabling the OMAG Server as a proxy, and starting the instance).

From there you can continue to override other methods of the OMRSMetadataCollectionBase class to implement the other metadata functionality for searching, updating and deleting as well as retrieving other instances of metadata like relationships. Most of these methods can be directly invoked (and therefore tested) using the REST API endpoints of the OMAG server.

A logical order of implementation might be:

Read operations

getEntitySummary()

… which you can test through GET to

http://egeriahost:8080/servers/test/open-metadata/repository-services/users/myself/instances/entity/{{guidOfEntity}}/summary

getEntityDetail()

… which you can test through GET to

http://egeriahost:8080/servers/test/open-metadata/repository-services/users/myself/instances/entity/{{guidOfEntity}}

getRelationshipsForEntity()

… which you can test through POST to

http://egeriahost:8080/servers/test/open-metadata/repository-services/users/myself/instances/entity/{{guidOfEntity}}/relationships

… with a payload like the following (to retrieve all relationships):

{
  "class": "TypeLimitedFindRequest",
  "pageSize": 100
}

These are likely to require the most significant logic for any mappings / translations you’re doing between the open metadata types and your own repository. For example, with Apache Atlas these are where we translate between the Apache Atlas native types like AtlasGlossaryTerm and its representation in the Apache Atlas java client and the open metadata type GlossaryTerm and its representation through the standard OMRS interfaces.

The other main area to then implement is searching, for example:

findEntitiesByProperty()

… which you can test through POST to

http://egeriahost:8080/servers/test/open-metadata/repository-services/users/myself/instances/entities/by-property

… with a payload like the following (to find only those GlossaryTerms classified as SpineObjects and whose name also starts with Empl):

{
  "class": "EntityPropertyFindRequest",
  "typeGUID": "0db3e6ec-f5ef-4d75-ae38-b7ee6fd6ec0a",
  "pageSize": 10,
  "matchCriteria": "ALL",
  "matchProperties": {
    "class": "InstanceProperties",
    "instanceProperties": {
      "displayName": {
        "class": "PrimitivePropertyValue",
        "instancePropertyCategory": "PRIMITIVE",
        "primitiveDefCategory": "OM_PRIMITIVE_TYPE_STRING",
        "primitiveValue": "Empl*"
      }
    }
  },
  "limitResultsByClassification": [ "SpineObject" ]
}

findEntitiesByClassification()

… which you can test through POST to

http://egeriahost:8080/servers/test/open-metadata/repository-services/users/myself/instances/entities/by-classification/ContextDefinition

… with a payload like the following (to find only those GlossaryTerms classified as ContextDefinitions where the scope of the context definition contains local — note to change the classification type you change the end of the URL path, above):

{
  "class": "EntityPropertyFindRequest",
  "typeGUID": "0db3e6ec-f5ef-4d75-ae38-b7ee6fd6ec0a",
  "pageSize": 100,
  "matchClassificationCriteria": "ALL",
  "matchClassificationProperties": {
    "class": "InstanceProperties",
    "instanceProperties": {
      "scope": {
        "class": "PrimitivePropertyValue",
        "instancePropertyCategory": "PRIMITIVE",
        "primitiveDefCategory": "OM_PRIMITIVE_TYPE_STRING",
        "primitiveValue": "*local*"
      }
    }
  }
}

findEntitiesByPropertyValue()

… which you can test through POST to

http://egeriahost:8080/servers/test/open-metadata/repository-services/users/myself/instances/entities/by-property-value?searchCriteria=address

… with a payload like the following (to find only those GlossaryTerms that contain address somewhere in one of their textual properties):

{
  "class": "EntityPropertyFindRequest",
  "typeGUID": "0db3e6ec-f5ef-4d75-ae38-b7ee6fd6ec0a",
  "pageSize": 10
}

and so on.

You hopefully have access to a search API for your repository so that you can efficiently fulfil these requests. You want to avoid pulling back a large portion of your metadata and having to loop through it in memory to find specific objects. Instead, push-down the search to your repository itself as much as possible…

Once you have those working, it should be relatively easy to go back and fill in areas like the other TypeDef-related methods, to ensure your connector can participate appropriately in a broader open metadata cohort.

Write operations

While the ordering above is necessary for all connectors, if you’ve decided to also implement write operations for your repository there are further methods to override. These include:

  • creation operations like addEntity,
  • update operations like updateEntityProperties,
  • and reference copy-related operations like saveEntityReferenceCopy.

If you are only implementing a read-only connector, these methods can be left as-is and the base class will indicate they are not supported by your connector.

7. Add the event mapper connector

Adding the event mapper
Adding the event mapper

The event mapper connector enables events from an existing metadata repository to distribute changes to metadata to the rest of the metadata repositories who are members of the same OMRS cohort. It is not a mandatory component: as long as your connector can “speak” Egeria through an OMRSMetadataCollection it can participate in an open metadata cohort via the Enterprise Connector. However, if your metadata repository already has some kind of event or notification mechanism, the event mapper can be an efficient addition to participating in the broader open metadata cohort.

Within the same adapter Maven module, perhaps under a new sub-package like ...eventmapper, implement the following:

Implement an OMRSRepositoryEventMapperProvider

Start by writing an OMRSRepositoryEventMapperProvider specific to your connector, which extends OMRSRepositoryConnectorProviderBase. The connector provider is a factory for its corresponding connector. Much of the logic needed is coded in the base class, and therefore your implementation really only involves defining the connector class and setting this in the constructor.

For example, the following illustrates this for the Apache Atlas Repository Connector:

package org.odpi.egeria.connectors.apache.atlas.eventmapper;

import org.odpi.openmetadata.frameworks.connectors.properties.beans.ConnectorType;
import org.odpi.openmetadata.repositoryservices.connectors.stores.metadatacollectionstore.repositoryconnector.OMRSRepositoryConnectorProviderBase;

public class ApacheAtlasOMRSRepositoryEventMapperProvider extends OMRSRepositoryConnectorProviderBase {

    static final String  connectorTypeGUID = "daeca2f1-9d23-46f4-a380-19a1b6943746";
    static final String  connectorTypeName = "OMRS Apache Atlas Event Mapper Connector";
    static final String  connectorTypeDescription = "OMRS Apache Atlas Event Mapper Connector that processes events from the Apache Atlas repository store.";

    public ApacheAtlasOMRSRepositoryEventMapperProvider() {
        Class connectorClass = ApacheAtlasOMRSRepositoryEventMapper.class;
        super.setConnectorClassName(connectorClass.getName());
        ConnectorType connectorType = new ConnectorType();
        connectorType.setType(ConnectorType.getConnectorTypeType());
        connectorType.setGUID(connectorTypeGUID);
        connectorType.setQualifiedName(connectorTypeName);
        connectorType.setDisplayName(connectorTypeName);
        connectorType.setDescription(connectorTypeDescription);
        connectorType.setConnectorProviderClassName(this.getClass().getName());
        super.setConnectorTypeProperties(connectorType);
    }

}

Note that you’ll need to define a unique GUID for the connector type, and a meaningful name and description. Really all you then need to implement is the constructor, which can largely be a copy / paste for most adapters. Just remember to change the connectorClass to your own, which you’ll implement in the next step (below).

Implement an OMRSRepositoryEventMapper

Next, write an OMRSRepositoryEventMapper specific to your connector, which extends OMRSRepositoryEventMapperBase and implements VirtualConnectorExtension and OpenMetadataTopicListener. This defines the logic to pickup and process events or notifications from your repository and produce corresponding OMRS events. As such the main logic of this class will be implemented by:

  • Overriding the initialize() method to define how you will initialize your event mapper. For example, this could be connecting to an existing event bus for your repository, or some other mechanism through which events should be sourced.
  • Overriding the start() method to define how to startup the processing of such events.
  • Implement the initializeEmbeddedConnectors() method to register as a listener to any OpenMetadataTopicConnectors that are passed as embedded connectors.
  • Implement the processEvent() method to define how to process each event received from your repository’s event / notification mechanism.

The bulk of the logic in the event mapper should be called from this processEvent() method: defining how events that are received from your repository are processed (translated) into OMRS events that deal with Entities, Classifications and Relationships.

Typically you would want to construct such instances by calling into your OMRSMetadataCollection, ensuring you produce the same payloads of information for these instances both through API connectivity and the events.

Once you have the appropriate OMRS object, you can make use of the methods provided by the repositoryEventProcessor, configured by the base class, to publish these to the cohort. For example:

  • repositoryEventProcessor.processNewEntityEvent(...) to publish a new entity instance (EntityDetail)
  • repositoryEventProcessor.processUpdatedRelationshipEvent(...) to publish an updated relationship instance (Relationship)
  • and so on

To add the event mapper configuration to the OMAG Server Platform configuration you started with above, add:

Configure the cohort event bus

This should be done first, before any of the other configuration steps above, by POSTing to:

http://egeriahost:8080/open-metadata/admin-services/users/myself/servers/test/event-bus?connectorProvider=org.odpi.openmetadata.adapters.eventbus.topic.kafka.KafkaOpenMetadataTopicProvider&topicURLRoot=OMRSTopic

… with a payload like the following:

{
  "producer": {
    "bootstrap.servers":"kafkahost:9092"
  },
  "consumer": {
    "bootstrap.servers":"kafkahost:9092"
  }
}

Configure the event mapper

This can be done nearly last, after all of the other configuration steps above but still before the start of the server instance. Specify your canonical OMRSRepositoryEventMapperProvider class name for the connectorProvider={javaClassName} parameter and connection details to your repository’s event source in the eventSource parameter by POSTing to:

http://egeriahost:8080/open-metadata/admin-services/users/myself/servers/test/local-repository/event-mapper-details

For example, in our Apache Atlas example we would POST to:

http://egeriahost:8080/open-metadata/admin-services/users/myself/servers/test/local-repository/event-mapper-details?connectorProvider=org.odpi.egeria.connectors.apache.atlas.eventmapper.ApacheAtlasOMRSRepositoryEventMapperProvider&eventSource=atlashost:9027

8. Test your connector’s conformance

Components for testing your connector's conformance
Components for testing your connector’s conformance

Aside from the API-based testing you might do as part of the on-going implementation of your OMRSMetadataCollection class, once you are in a position where you have most of the methods implemented it is a good idea to test your connector against the Egeria Conformance Suite.

This will provide guidance on what features you may still need to implement in order to conform to the open metadata standards.

Once your connector conforms, you should also then have the necessary output to apply to use the ODPi Egeria Conformant mark.

Social Media Auto Publish Powered By : XYZScripts.com