Data science for all – an open source approach to education

By | Blog, ODPi OpenDS4All

Editor’s Note: This blog post from Ana Echeverri is reposted from the IBM Global Data Science Forum blog

Today is an exciting day for me. After months of hard work, IBM, the University of Pennsylvania, and the Linux Foundation are announcing an innovative, first-of-a-kind open source project that will enable universities around the world to build Data Science programs faster.

With IBM’s investment and industry expertise, University of Pennsylvania’s long-standing academic leadership and the Linux Foundation as a premier open source consortium, we are creating a curriculum kit comprised of a set of open source building blocks for teaching the core concepts of data science in undergraduate and graduate programs. These building blocks are based on Python and open source tools and frameworks, and include slides, documentation, code, and data sets that could be adopted or updated by anyone.

This idea of open source Data Science education is personal to me. Access to education changed my life.  Coming from a small town in Colombia, South America, education gave me the opportunity to work with cutting edge Data Science and AI technologies at one of the best companies in the world (IBM).  I believe this project will provide a foundation of building blocks for schools to supplement, strengthen and start up their data science programs. And most importantly, because this is open source, it enables any institution on earth thus providing more opportunities for learners to  participate in the AI Economy like I did.

When I first started this project, I met with universities in different regions of the world and a common theme emerged: starting a Data Science program from scratch is incredibly difficult, and universities need educational materials to accelerate their efforts. This was not only encouraging but validated the need: there is a demand worldwide and this concept of open source education could reach across oceans and to our local community colleges.

By making a “starter set” of training materials available and providing guidance on how to build a Data Science program, IBM and cross-industry partners and educators working together can help accelerate the availability of skills building programs around the world.

It is the beginning of a new era for Data Science Education.

The project is in incubation currently as IBM and UPenn create the initial set of materials to contribute.  The project will officially launch in early 2020. To get early insights and stay up to date with this project please register here.

Getting started with Egeria notebooks using docker

By | ODPi Egeria

Do you like understanding a new technology hands-on yet also want to understand the concepts? Concerned it will take too long to get started?

Wait no longer! You can now experiment with Egeria by making use of our new Jupyter notebooks installed via Docker. Within minutes (plus download time) you’ll be happily running REST API calls against a live Egeria environment, and gaining an understand of Egeria’s concepts.

In this first Blog post I’ll take you through getting set up with a lab environment and running your first notebook.


Before we get started on setting up Egeria, you’ll need access to a few things:

  • docker – the environment in which to run Egeria
  • git – the source code control tool to get files needed

Setting up docker

Docker makes it easy to run pre-created environments in ‘containers’ which are isolated from the host machine such as your laptop. The instructions here were tested with ‘Docker for Mac’, but you can also use ‘Docker for Windows’, or docker installed on linux.

Note: The containers are linux containers built for Intel 64 bit architecture, so they won’t work on ARM, nor will they work in Windows containers …

Once you’ve installed docker, make sure it’s running as covered in the docs above. If using windows or mac, you should see a docker icon (a whale) on the toolbar.

Setting up git

git is the tool we use to manage our code. If you don’t have it installed, install it from the git website (easiest), or else from your linux distribution or homebrew . No special configuration is needed.

Retrieving the Egeria code

You’re now ready to retrieve the Egeria code. Whilst we only need a few files for the docker work this will be useful for further exercises and following along with other blog posts.

Open up a command window (mac, windows or linux), switch to a suitable directory and type:

git clone 

This will pull down the egeria code locally to your machine.

Running the notebooks

We’re now ready to run the notebook. To do this we will use a feature of docker called ‘docker-compose’. This is a simple approach to running multiple containers (think of these as applications or services) together.

For this example we are running

To get started with the docker compose environment (all one line – and replace / with \ for Windows):

cd egeria/open-metadata-resources/open-metadata-deployment/compose/tutorials
docker-compose -f egeria-tutorial.yaml up

At this point you’ll notice a lot of activity. Once it has settled down go to a web browser and go to http://localhost:18888 . You should see a Jupyter notebook environment open, and a list of our current labs will be shown in the left hand folder tree

If you don’t see the UI appear, press CTRL-C, and retry the docker compose command. Sometimes a slower network download can cause things not to start properly first time.

Running the notebooks

In the Jupyter UI navigate to ‘administration’ and open up the `read-me-first` notebook. This introduces you to how to setup an Egeria environment in a fictional company ‘Coco Pharmaceuticals’.

The large blue bar is effectively a cursor. It shows where you are in the notebook. Read each paragraph in turn and then hit the ‘play’ button to progress through the notebook. You can also press SHIFT-ENTER to run the current step and move to the next one.  As well as text, some paragraphs contain code which are being executed live against a real egeria server in your docker environment.

Once you’ve worked through this notebook try ‘managing-servers’ which goes into more specifics of how to start and stop servers. Other tutorials get into topics such as accessing assets.

Shutting down the environment

docker-compose -f egeria-tutorial.yaml down

Updating the environment

Each time the environment is started the same code will be run, since the container is downloaded the first time it’s used. 

In order to refresh the contains and run the latest code (recommended) run:

docker-compose -f egeria-tutorial.yaml pull

Further information

If you have any problems running the notebooks:

These containers we used above can be used in other ways too – stay tuned to the blog to find out more.

How Do I Teach My Second Grade Kid What AI Is?

By | Blog, ODPi BI and AI

By Cupid Chan, CTO, Index Analytics

I recently took my kids to Hersey’s Park in Pennsylvania. In case you haven’t heard about it, it’s just a normal attraction park with rides, and long lines. As we were waiting in line, my son asked, “Dad, what are you doing at work?”

I said, “I help my clients to define KPIs, and then try to apply Naive Bayes to predict the outcome. If the result is not good, we may need to build a neural network, and test it again.”

Do you really think that’s the answer I gave my son? 


Not because what I said is wrong, but he is simply not the right audience for that type of response. More importantly, I don’t want him to think “My dad is crazy and I’d better not ask him anything again.”  So, I need to come up with an answer in a language that he can understand. 

If a computer can do work but no one knows whether it’s you doing the work or the computer, that’s AI.” – a basic principle of AI proposed by Alan Turing.

“Great! I can then use AI to do my homework and my teacher would not know that it’s not me doing that!”

Supervised Learning 

“Hmm… Do you remember how you taught your younger sister the difference between a pen and an apple? You hold up a pen in front of her so she can see it and say, ‘pen.’ And you hold up an apple so she can see it and say, ‘apple.’ And you repeat this. Sooner or later, you expect her to understand the long pointy thing is a pen. And the red, round thing is an apple.”

Long, pointed, round, red. These are Features in Machine Learning. And “Pen” or “Apple” are Labels. Combined, this is Supervised Learning. This is one way how a computer can understand that different Features are associated with different Labels in Supervised Learning. 

“Dad, I remember I saw a guy teaching people this on YouTube, too!”

PIKOTARO – PPAP (Pen Pineapple Apple Pen) (Long Version) [Official Video]

Well, the song is funny but it is not related to Supervised Learning. But if it inputs the concept of Supervised Learning for a child, why not let it be?

In the real world, Supervised Learning can help in many different ways. One of them is distinguishing between a cancer cell from a normal cell. In this case, the computer is the “child” and the doctor is the “parent.” By showing examples repeatedly, the doctor trains the computer to distinguish the patterns between a normal cell and a cancer cell.

Unsupervised Learning

You may have heard about the Law of Entropy, or the Second Law of Thermodynamics. In general, unless you put in energy to keep the situation in that current state, the whole condition will just become messier over time.

You can apply the very same law to a kid’s playground. Unless you really put in effort to keep toys tidy, the toys will not automatically go back to their original positions. At my home, my mother-in-law helps out the kids to keep the play areas organized. Once, when she went to Hong Kong for a vacation, the play areas became more disorganized day after day. Finally, my wife had to step in and demand that the kids clean up before grandmother returned. She did not give exact instructions. She just demanded they clean up!

Guess what happened in the next few hours? The kids put all the four-wheels-boxy-shaped things in one area, and we called it “Cars.” And all the fluffy stuff was put together in another area, and we called it “Stuffed Animals.” And then they put all the blocks that can be stacked up together in some boxes and named “Legos.”

They did not get any specific instructions or rules to decide what should go where. But somehow they figured out the similarities and differences. In Machine Learning, this is called Unsupervised Learning.

This is when the computer is given a lot of data points and the computer figures out the pattern by itself. In the real world, Unsupervised Learning can be used in customer segmentation. There is a lot of information and data about a lot of customers. You don’t tell the computer who should be grouped with whom, but this is figured out by Unsupervised Learning. Traditionally, this is done by the expert who observes different patterns, like age, spending pattern, where you live, salary… and then tries to group the types of customers together. And now, we have the machine to play the role of expert, which is able to scan through millions of records in a few seconds but is impossible for any human being

Reinforcement Learning

When dealing with kids, it’s not always the best way to just keep telling them and keep showing them the proper examples. At the same time, it’s not very effective to give no instructions and let them figure out everything by themselves. 

It’s a common practice in teaching kids to reward them when they do something good. And when they do something bad, you punish them. This is intended to reinforce certain behaviors. In Machine Learning, this is known as Reinforcement Learning.

When a computer performs the way that you want, you add a point. When it fails to do what you want, you reduce a point. The computer therefore knows what to do to gain points. 

In the real world, Reinforcement Learning is applied heavily in Robotics. For example, a robot is trying to walk a straight line. It may make it or it may fall down. Whenever the robot falls down, you reduce a point. And whenever the robot successfully makes one step, you add one point. There are many motors and sensors on a robot, and all of them are collecting data for the system. The robot learns what kind of motor speed, what kind of angle is needed in order to keep walking in a straight line and avoid falling.

2 Types of Measurement

2 Popular Questions by Kids – Key Approaches in Machine Learning

Kids like to ask a strangers, “How old are you?” and “Are you a boy or a girl?”

“How old are you?” is asking for a number. It’s Regression.

“Are you a boy or a girl?” is Classification. Looking for an outcome for a pre-defined category. Both are 2 important concepts in Machine Learning.

3 Ways to Learn

Kids observe the world around them. They come up with certain rules. They will propose the result, and they will be corrected by adults. Which makes the rule to get better and better.

Compared to the old way of programming: Developer observes the world. They code rules using rule-based algorithms. And they will come up with some results. Based on this, they will change or modify the rules. 

In AI, it’s a little bit different. Developer creates the AI algorithm and have it create the rule. The algorithm comes up with a model and continue to train it. The model then tries to predict the result and see if it is accurate or not. The key here is that the algorithm keeps modifying the model using more data without the developer being involved. 

That’s the beauty of AI!

No Right or Wrong. Just Right or Left! 

Final question: What are the similarities and differences between Tesla and Uber? They both are both in the automobile industry. But one company, Tesla, creates new technology to help revolutionize the whole car industry. While Uber uses existing technology (like mapping, mobile app..etc) to create a new business model.

So the power of AI is not just in making algorithms. It can be using existing algorithms to build new ways of doing business. One builds the technology, one utilizes it.

Remember my son who was thinking about ways to get his homework done? Ultimately, I would be equally proud if he came up with an algorithm that could do his homework and successfully fool his teacher or if he utilized existing algorithms to do the same thing. Both are important new ways of adopting AI to solve problems. 

There is no Right or Wrong, only Right or Left. But no matter which direction you pick, be persistent and you will cross the finish line of success via either route – Cupid Chan tweet on Nov 28, 2018

The content of this blog has been presented in a few national and international conferences such as Open Source Summit in Shanghai China and MicroStrategy Federal Summit in Washington DC. I also captured this in my very first YouTube channel video which you can find here: 

Twitter: @cupidckchan


Stay Informed

Sign up for our Newsletter to receive the latest ODPi news and updates.

Social Media Auto Publish Powered By :