Category

ODPi BI and AI

How Do I Teach My Second Grade Kid What AI Is?

By | Blog, ODPi BI and AI

By Cupid Chan, CTO, Index Analytics

I recently took my kids to Hersey’s Park in Pennsylvania. In case you haven’t heard about it, it’s just a normal attraction park with rides, and long lines. As we were waiting in line, my son asked, “Dad, what are you doing at work?”

I said, “I help my clients to define KPIs, and then try to apply Naive Bayes to predict the outcome. If the result is not good, we may need to build a neural network, and test it again.”

Do you really think that’s the answer I gave my son? 

OF COURSE NOT!

Not because what I said is wrong, but he is simply not the right audience for that type of response. More importantly, I don’t want him to think “My dad is crazy and I’d better not ask him anything again.”  So, I need to come up with an answer in a language that he can understand. 

If a computer can do work but no one knows whether it’s you doing the work or the computer, that’s AI.” – a basic principle of AI proposed by Alan Turing.

“Great! I can then use AI to do my homework and my teacher would not know that it’s not me doing that!”

Supervised Learning 

“Hmm… Do you remember how you taught your younger sister the difference between a pen and an apple? You hold up a pen in front of her so she can see it and say, ‘pen.’ And you hold up an apple so she can see it and say, ‘apple.’ And you repeat this. Sooner or later, you expect her to understand the long pointy thing is a pen. And the red, round thing is an apple.”

Long, pointed, round, red. These are Features in Machine Learning. And “Pen” or “Apple” are Labels. Combined, this is Supervised Learning. This is one way how a computer can understand that different Features are associated with different Labels in Supervised Learning. 

“Dad, I remember I saw a guy teaching people this on YouTube, too!”

PIKOTARO – PPAP (Pen Pineapple Apple Pen) (Long Version) [Official Video]

Well, the song is funny but it is not related to Supervised Learning. But if it inputs the concept of Supervised Learning for a child, why not let it be?

In the real world, Supervised Learning can help in many different ways. One of them is distinguishing between a cancer cell from a normal cell. In this case, the computer is the “child” and the doctor is the “parent.” By showing examples repeatedly, the doctor trains the computer to distinguish the patterns between a normal cell and a cancer cell.

Unsupervised Learning

You may have heard about the Law of Entropy, or the Second Law of Thermodynamics. In general, unless you put in energy to keep the situation in that current state, the whole condition will just become messier over time.

You can apply the very same law to a kid’s playground. Unless you really put in effort to keep toys tidy, the toys will not automatically go back to their original positions. At my home, my mother-in-law helps out the kids to keep the play areas organized. Once, when she went to Hong Kong for a vacation, the play areas became more disorganized day after day. Finally, my wife had to step in and demand that the kids clean up before grandmother returned. She did not give exact instructions. She just demanded they clean up!

Guess what happened in the next few hours? The kids put all the four-wheels-boxy-shaped things in one area, and we called it “Cars.” And all the fluffy stuff was put together in another area, and we called it “Stuffed Animals.” And then they put all the blocks that can be stacked up together in some boxes and named “Legos.”

They did not get any specific instructions or rules to decide what should go where. But somehow they figured out the similarities and differences. In Machine Learning, this is called Unsupervised Learning.

This is when the computer is given a lot of data points and the computer figures out the pattern by itself. In the real world, Unsupervised Learning can be used in customer segmentation. There is a lot of information and data about a lot of customers. You don’t tell the computer who should be grouped with whom, but this is figured out by Unsupervised Learning. Traditionally, this is done by the expert who observes different patterns, like age, spending pattern, where you live, salary… and then tries to group the types of customers together. And now, we have the machine to play the role of expert, which is able to scan through millions of records in a few seconds but is impossible for any human being

Reinforcement Learning

When dealing with kids, it’s not always the best way to just keep telling them and keep showing them the proper examples. At the same time, it’s not very effective to give no instructions and let them figure out everything by themselves. 

It’s a common practice in teaching kids to reward them when they do something good. And when they do something bad, you punish them. This is intended to reinforce certain behaviors. In Machine Learning, this is known as Reinforcement Learning.

When a computer performs the way that you want, you add a point. When it fails to do what you want, you reduce a point. The computer therefore knows what to do to gain points. 

In the real world, Reinforcement Learning is applied heavily in Robotics. For example, a robot is trying to walk a straight line. It may make it or it may fall down. Whenever the robot falls down, you reduce a point. And whenever the robot successfully makes one step, you add one point. There are many motors and sensors on a robot, and all of them are collecting data for the system. The robot learns what kind of motor speed, what kind of angle is needed in order to keep walking in a straight line and avoid falling.

2 Types of Measurement

2 Popular Questions by Kids – Key Approaches in Machine Learning

Kids like to ask a strangers, “How old are you?” and “Are you a boy or a girl?”

“How old are you?” is asking for a number. It’s Regression.

“Are you a boy or a girl?” is Classification. Looking for an outcome for a pre-defined category. Both are 2 important concepts in Machine Learning.

3 Ways to Learn

Kids observe the world around them. They come up with certain rules. They will propose the result, and they will be corrected by adults. Which makes the rule to get better and better.

Compared to the old way of programming: Developer observes the world. They code rules using rule-based algorithms. And they will come up with some results. Based on this, they will change or modify the rules. 

In AI, it’s a little bit different. Developer creates the AI algorithm and have it create the rule. The algorithm comes up with a model and continue to train it. The model then tries to predict the result and see if it is accurate or not. The key here is that the algorithm keeps modifying the model using more data without the developer being involved. 

That’s the beauty of AI!

No Right or Wrong. Just Right or Left! 

Final question: What are the similarities and differences between Tesla and Uber? They both are both in the automobile industry. But one company, Tesla, creates new technology to help revolutionize the whole car industry. While Uber uses existing technology (like mapping, mobile app..etc) to create a new business model.

So the power of AI is not just in making algorithms. It can be using existing algorithms to build new ways of doing business. One builds the technology, one utilizes it.

Remember my son who was thinking about ways to get his homework done? Ultimately, I would be equally proud if he came up with an algorithm that could do his homework and successfully fool his teacher or if he utilized existing algorithms to do the same thing. Both are important new ways of adopting AI to solve problems. 

There is no Right or Wrong, only Right or Left. But no matter which direction you pick, be persistent and you will cross the finish line of success via either route – Cupid Chan tweet on Nov 28, 2018

The content of this blog has been presented in a few national and international conferences such as Open Source Summit in Shanghai China and MicroStrategy Federal Summit in Washington DC. I also captured this in my very first YouTube channel video which you can find here: https://www.youtube.com/watch?v=dh9xz4SBukE&t=13s 

Twitter: @cupidckchan

Linkedin: www.linkedin.com/in/cupidchan/ 


ODPi Members IBM, SAS and Index Analytics Share How to Leverage Business Intelligence and Big Data for better ROI in New White Paper

By | Announcements, ODPi BI and AI

SAN FRANCISCO – December 13, 2018 – ODPi, a nonprofit organization accelerating the open ecosystem of big data solutions, today announced a new white paper that showcases how ODPi members IBM, SAS, Index Analytics and other technical contributors leverage Business Intelligence (BI) and Big Data for better ROI. The annual research takes the pulse of industry leaders and displays insight into how BI can be addressed by Hadoop through multi-structured data and advanced big data analytics.

“It doesn’t matter how much data you have; unless you can get the insight from it, it is just bits and bytes occupying the storage,” said Cupid Chan, CTO at Index Analytics. “As more traditional businesses evolve to digital transformation, they need to be leveraging true BI and Big Data into their products and services to get the results they want.”

By 2020, the accumulated volume of big data will increase from 4.4 zettabytes to roughly 44 zettabytes or 44 trillion GB and companies will be charged with turning this data into insight. Big data technologies like Hadoop have become an ecosystem of open source projects that provide processing engines to perform data transformation and analysis. Many businesses, however,  don’t know how to combine the BI with Big Data to get insightful business value.

ODPi’s newest white paper presents best practices for combining BI and Big Data. ODPi members IBM and SAS and current technical contributors from other BI vendors including Qlik, MicroStrategy and Tableau share real end-user perspectives on how they are using big data tools, the challenges they face and where they are looking to enhance their investments.

Their unique perspectives highlight includes:

  • The preferred BI/SQL Connector (Hive, Presto, Impala…etc) for their BI Tool to connect to Hadoop
  • Best practices to connect to both Hadoop and RDBMS
  • Recommended BI architecture to query data in Hadoop
  • How BI runs advanced analytics including Machine Learn algorithm on Hadoop

As an industry practitioner and ODPi lead for the BI & AI Special Interest Group (SIG), Chan discussed and surveyed a few project members and contributors in the technical community to benchmark findings in this annual survey. To view the complete white paper, click here.

“As data sizes continue to grow, it becomes harder and harder to gain visual and analytic insights from it,” said Craig Rubendall, Vice President, Platform Research and Development at SAS. “To be successful, it is imperative to use practices and products built to handle the demands of this environment. SAS is proud to collaborate with the other contributors to share our experience and knowledge.”

Hosted by The Linux Foundation, ODPi is an industry effort that aims to accelerate the adoption of big data technologies. Through a vendor-neutral, industry-wide approach to data governance and data science, ODPi members bring maturity and choice to an open ecosystem. In fact, ODPi welcomes Index Analytics as a new member and participant in its growing ecosystem to concentrating on advancing strategies and initiatives for data science, BI and Artificial Intelligence (AI).

About the newest member:

Index Analytics has optimized IT solutions and improved our clients’ return on investment (ROI) by providing high-quality enterprise Health IT solutions to Federal & State government agencies. We encourage a culture of data curiosity at every implementation, championed by our experienced, dedicated personnel. This culture promotes meaningful outcomes for our clients and a strong sense of partnership throughout the process.

About ODPi

ODPi is a nonprofit organization committed to simplification and standardization of the big data ecosystem. As a shared industry effort, ODPi members represent big data technology, solution provider and end user organizations focused on promoting and advancing the state of big data technologies for the enterprise. For more information about ODPi, please visit: http://www.ODPi.org

###

ODPi Webinar on How BI and Data Science Gets Results

By | Blog, Events, ODPi BI and AI

By John Mertic, Director of ODPi at The Linux Foundation

ODPi recently hosted a webinar on getting results from BI and Data Science with Cupid Chan, managing partner at 4C Decision, Moon soo Lee, CTO and co-founder of ZEPL and creator of Apache Zeppelin, and Frank McQuillan, director of product management at Pivotal.

During the webinar, we discussed the convergence of traditional BI and Data Science disciplines (machine learning, artificial intelligence… etc), and why statistical/data science models can now run on Hadoop in a much more cost effective manner than a few years ago.

The second part of the webinar focused on demos of Jupyter Notebooks and Apache Zeppelin. These were important and relevant demos, as Data Scientist utilize Jupyter Notebooks the most and Apache Zeppelin supports multiple technologies, multi-languages & environments; making it a great tool for BI.

The inspiration for the webinar was the new Data Science Notebook Guidelines. Created by the ODPi BI and Data Science SIG, the guidelines help bridge the gap so that BI tools can sit harmoniously on top of both Hadoop and RDBMS, while providing the same, or even more, business insight to the BI users who have also Hadoop in the backend. Download Now »

Additionally, webinar listeners asked detailed questions; including:

  • How can one transition from a bioinformatics developer to Data scientist in Bio-statistic?
  • Where do you see the future of both Jupyter and Zeppelin going? Are there other key data science challenges needing solved by these tools?
  • When do you choose to use one notebook over the other?
  • Can the 2 notebooks be used together?  i.e., can you create a Jupyter notebook and save it, then upload it into Zeppelin (or vice versa)?

Overall, the webinar was an insightful discussion on how we can achieve big data ecosystem integration in a collaborative way

If you missed the webinar, Watch the Replay and Download the Slides.