The Rise of Big Data Governance: Strata Data Conference and DataWorks Summit Sessions, Webinar, RedGuide and More!

By | Blog

Each of today’s most forward-thinking enterprises have been forced to face similar data challenges: the reliance on real-time data to better serve their customers and, subsequently, the requirement of complying with regulations to protect that data, such as the EU’s General Data Protection Regulation (GDPR).

ODPi Data Governance PMC is working to create a neutral, industry-wide approach to data governance. Together, they are supporting the mission of creating an open data ecosystem through collaboration with subject matter experts and data platform and tools vendors.

Below please find upcoming speaking sessions, Meetups, webinars and a RedGuide meant to further the discussion and work of Data Governance.

March 6–8, 2018

Strata Data Conference

San Jose, CA

The rise of big data governance: Insight on this emerging trend from active open source initiatives

Speakers:

 Maryna Strelchuk (ING)

 John Mertic (ODPi)

Time: 1:50pm–2:30pm

Date: Wednesday, March 7, 2018

https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64048

John Mertic and Maryna Strelchuk detail the benefits of a vendor-neutral approach to data governance, explain the need for an open metadata standard, and share how companies like ING, IBM, Hortonworks, and more are delivering solutions to this challenge as an open source initiative. The solution to this emerging challenge is a tricky one. For companies like ING, this data governance challenge has been met with metadata, a consistent view across a large heterogeneous ecosystem, and collaboration with an active open source community.

—————————-

April 16-19, 2018

DataWorks Summit

Berlin, Germany

The rise of big data governance: Insight on this emerging trend from active open source initiatives

Speakers:

 Ferd Scheepers (ING)

 John Mertic (ODPi)

https://dataworkssummit.com/berlin-2018/

Attendees will understand the role of metadata, the need for a cross-technology view on metadata, the role of Apache Atlas as a reference implementation, and the role of ODPi in offering value-added services, such as certification.

ODPi Data Governance PMC

Hosted by:

 Mandy Chessell (IBM)

https://dataworkssummit.com/berlin-2018/bofs/

This Birds of Feather (BoFs) sessions, hosted by IBM, ING, ODPi, and Hortonworks will include discussions around the ODPi Data Governance PMC. Come and share your experiences, challenges, future interests.

—————————-

April 26, 2018 at 9am PST/ 12pm EST

ODPi Webinar

Speakers: Mandy Chessell (IBM), John Mertic (ODPi)

Topic – Discussion of the IBM Redguide “The Journey Continues: From Data Lake to Data-Driven Organization”, an overview of the ODPi Data Governance PMC and a look at what’s to come this year.

Sign up here: https://www.odpi.org/projects/data-governance-pmc 

Check @ODPi on Twitter for details soon!

—————————-

Download Now!

The Journey Continues: From Data Lake to Data-Driven Organization

Written by Mandy Chessell (IBM), Ferd Scheepers (ING), Maryna Strelchuk (ING), Ron van der Starre (IBM), Seth Dobrin (IBM), and Daniel Hernandez (IBM)

http://www.redbooks.ibm.com/Abstracts/redp5486.html?Open  

This IBM Redguide™ publication looks back on the key decisions that made the data lake successful and looks forward to the future. It proposes that the metadata management and governance approaches developed for the data lake can be adopted more broadly to increase the value that an organization gets from its data. Delivering this broader vision, however, requires a new generation of data catalogs and governance tools built on open standards that are adopted by a multi-vendor ecosystem of data platforms and tools.

Work is already underway to define and deliver this capability, and there are multiple ways to engage. This guide covers the reasons why this new capability is critical for modern businesses and how you can get value from it.

ODPi Webinar on How BI and Data Science Gets Results

By | Blog

By John Mertic, Director of ODPi at The Linux Foundation

ODPi recently hosted a webinar on getting results from BI and Data Science with Cupid Chan, managing partner at 4C Decision, Moon soo Lee, CTO and co-founder of ZEPL and creator of Apache Zeppelin, and Frank McQuillan, director of product management at Pivotal.

During the webinar, we discussed the convergence of traditional BI and Data Science disciplines (machine learning, artificial intelligence… etc), and why statistical/data science models can now run on Hadoop in a much more cost effective manner than a few years ago.

The second part of the webinar focused on demos of Jupyter Notebooks and Apache Zeppelin. These were important and relevant demos, as Data Scientist utilize Jupyter Notebooks the most and Apache Zeppelin supports multiple technologies, multi-languages & environments; making it a great tool for BI.

The inspiration for the webinar was the new Data Science Notebook Guidelines. Created by the ODPi BI and Data Science SIG, the guidelines help bridge the gap so that BI tools can sit harmoniously on top of both Hadoop and RDBMS, while providing the same, or even more, business insight to the BI users who have also Hadoop in the backend. Download Now »

Additionally, webinar listeners asked detailed questions; including:

  • How can one transition from a bioinformatics developer to Data scientist in Bio-statistic?
  • Where do you see the future of both Jupyter and Zeppelin going? Are there other key data science challenges needing solved by these tools?
  • When do you choose to use one notebook over the other?
  • Can the 2 notebooks be used together?  i.e., can you create a Jupyter notebook and save it, then upload it into Zeppelin (or vice versa)?

Overall, the webinar was an insightful discussion on how we can achieve big data ecosystem integration in a collaborative way

If you missed the webinar, Watch the Replay and Download the Slides.

Checking In: Apache™ Bigtop “Test Drive” Grant Recipients’ Work in Full Swing

By | Blog

Back in April, with the release of ODPi 2.1, we fully transitioned to leveraging Apache Bigtop for our reference implementation and validation testsuite needs.

Following such a successful changeover, we launched the Apache™ Bigtop “Test Drive” Grant Program – a new grant funding program designed to increase developer involvement in the Apache Software Foundation (ASF) project.

With this program, ODPi has invested $50,000 to fund developer work from some of the world’s top Apache and big data developers and architects to expand Bigtop’s functionality and usability.

The response to the call for proposals was tremendous, and we at ODPi want to thank everyone for your amazing ideas on how to improve Bigtop. After careful deliberation, the Technical Steering Committee (TSC) selected the following two proposals:

  • The “Automatic cluster orchestration and provisioning” project – led by Dr. Konstantin Boudnik, Chief Technologist of BigData & Open Source Fellow at EPAM Systems – will aim to solve the issue of how long cluster provisioning takes, which is unacceptable in the development/testing environments. As this issue affects every developer/quality engineer in need of semi-frequent cluster bring-ups, his progress will be meaningful to all Apache Bigtop users.
  • The “Project Frontier” project – led by Evans Ye, VP of Apache Bigtop & big data engineer at Yahoo! – will aim to provide an easy-to-use integration test framework for Apache Bigtop, as it doesn’t currently work with other Hadoop ecosystem projects closely on releases and integration tests to ensure versions of different projects are working properly one another.

In launching this “Test Drive,” our goal is that the results will benefit enterprise end-users, developers and, ultimately, build a stronger relationship with the Apache Software Foundation and the big data communities hosted by it.

As the program ends on February 2, make sure to come back for individual updates and official outcomes from the developers!

ODPi and AtScale to Partner on 3rd Annual Big Data Maturity Survey

By | Blog

We are very excited to announce that ODPi will join AtScale and their other partners Cloudera, Hortonworks and MapR in the 3rd annual Big Data Maturity survey.

The Big Data Maturity Survey is designed to provide a global view of big data use by industry, workload and geography.

The survey takes less than 5 minutes and you can take it here (http://www.surveygizmo.com/s3/3648348/odpi).

Your responses will remain confidential and will be used only to construct the aggregated results.

In appreciation for your time, you’ll be entered in a drawing to win a $100 Amazon gift card.

Why ODPi is Partnering with AtScale

We have referenced the Big Data Maturity survey in prior blogs and we couldn’t be happier that AtScale asked us to help get the word out with our members to the enterprise production Apache Hadoop community we serve.

AtScale’s report confirms many of the things ODPi members tell us about the state of enterprise-wide production Apache Hadoop, and the challenges many businesses face in getting there.

73% in Production

As the callout on the left shows, the 2016 Big Data Maturity survey found that 73% of respondents run Hadoop in production.

ODPi members and community validate this figure, with one caveat: we want to understand what users actual mean when they say “production.”

A look at Figure 1 shows what we mean.

 

 

Figure 1: Production ≠ Production

Clearly Lab (AKA experimental) deployments Do Not qualify as production. But PoCs, Pilots and of course enterprise-wide deployments DO.

So, if my calculations are correct: 14% PoC  + 29% Pilot + 28% Enterprise-wide = 71% Total in Production

It’s great when two different bodies of research arrive at the same conclusion using different approaches. It means we can have even more confidence in the results!

BUT, 72% of users are in Pre-Enterprise-wide Production

As bullish as ODPi is about the production opportunities for Hadoop and Big Data, what we see in Figure 2 is that 72% of Hadoop users are piled in what we’ll call limited use scenarios.

Figure 2: Most Hadoop users not yet in Enterprise-wide

ODPi exists to provide the neutral space where the Big Data industry can come together to knock down our common barriers to enterprise-wide production Hadoop

Governance and Security Key to Unlocking Door to Enterprise-wide Use

As the callout from the 2016 Big Data Maturity report shows, Governance is the fastest growing concern.

ODPi is also focused on the need for an industry-wide approach to Hadoop governance. Partnering with leading Big Data users, platform providers, and app vendors, ODPi will establish standard APIs and compliance programs that will ensure compatibility across BI tools and various metadata stores.

Conclusion

Be sure to take 5 minutes out to complete this year’s Big Data Maturity survey. You’ll be able to use the results to benchmark your efforts against your industry peers and gain valuable insight into technology trends that impact you.

Here at ODPi, we will use the research to continue focusing on building open programs, open source projects and free resources to smooth the path towards enterprise-wide production Hadoop.

Stay Informed

Sign up for our Newsletter to receive the latest ODPi news and updates.