All Posts By

John Mertic

ODPi Goes to Apache: Big Data North America!

By Blog

Apache: Big Data North America is next week, May 9-12 in Vancouver, Canada and will gather a kaleidoscope of big data devotees from the Apache project community to come together to further the education and advancement of Apache open source projects. As an upstream contributor to the Apache Software Foundation and an organization that strives to build upon the innovative work of ASF, we are thrilled to be exhibiting and speaking at Apache: Big Data!

Accompanied by a few of our member companies Ampool, IBM, Hortonworks, and Pivotal, attendees will learn how ODPi will ease integration and standardization for downstream application vendors and end-users that build upon Apache Hadoop®.

The ODPi booth (#11) on the exhibiting floor and will feature a Hacker’s Lounge where attendees can come by and meet industry innovators like Ampool Founder and CEO, Milind Bhandarkar, Ph.D.

As mentioned, there are a number of great ODPi-focused sessions during the show. Here’s an overview of our must-see sessions:


  • Roman Shaposhnik & Konstantin Boudnik from Pivotal will discuss How ODPi Leveraged Apache Bigtop to Get to Market Faster (and You Can Too!). Attendees will learn how contributions from ODPi members are helping Apache Bigtop get even stronger and provide an integration platform for the next generation big data technologies.
  • There will be an ODPi panel, ODPi: Advancing Open Data for the Enterprise, where attendees will receive an overview of how Hadoop distributors, ISVs, SIs, and enterprises (end users) will benefit from standardization. Panel members include John Mertic of ODPi, Milind Bhandarkar of Ampool, Terrance Yim of CaskData, Alan Gates of Hortonworks, and Susan Malaika of IBM. Moderated by Roman Shaposhink of Pivotal.
  • Milind Bhandarkar of Ampool dives into a riveting story Standing on Shoulders of Giants: Ampool Story and also discusses how Ampool is contributing to several ASF projects in ODPi.


  • Alan Gates from Hortonworks provides a thought-provoking Keynote: ODPi 101: Who We Are, What We Do and Don’t Do. Gates will outline the ODPi Core, a set of software components, a detailed certification and a set of open source tests to make it easier to create big data solutions and data-driven applications.


  • John Mertic from ODPi showcases an important Keynote- ODPi and ASF: Building a stronger Hadoop ecosystem, where he details how the ODPi’s specifications and by-laws reinforce the role of the ASF as the singular place where Apache Hadoop development occurs.
  • John Mertic of ODPi and Jim Jagielski of the Apache Software Foundation team up for a riveting conversation ODPi and ASF Collaboration. Attendees will learn about how the ASF and ODPi complement each other and work together, where the big data ecosystem is heading, and more. Mark your calendar now, you will not want to miss this session!

A supporter of the work the ASF has done with the Apache Hadoop ecosystem since our inception, we are looking forward to supporting, connecting with, and learning from the Apache project community on advancing the big data ecosystem.

We hope you will join us at Apache: Big Data!

Register now for Apache Big Data.

Making the Case for ODPi: How We Are Furthering Hadoop and Big Data Ecosystem

By Blog

As a Linux Foundation project, we are lucky to benefit from the vast experience the foundation has in dealing with open source projects experiencing rapid growth. With the organization’s support and our 25 members working together, we believe we can increase adoption and open opportunities for innovation on top of an already strong Hadoop community.

With our first release – ODPi Runtime Specification – behind us we thought we’d share insights from several of our founding members on how ODPi fits into the Hadoop and Big Data ecosystem. This panel discussion, featuring executives from Hortonworks, Pivotal, IBM and Reactor8 also covered what kind of companies will benefit from developing new standards across the ecosystem.

Screen Shot 2016-03-24 at 1.13.00 PM.png

How Hadoop Can Help Organizations Manage and Take Advantage of Big Data

Organizations across industries are struggling to manage the massive amounts of data available to them. In response, Hadoop is quickly replacing data warehouse platforms, offering a distributed processing framework designed to address the volume and complexity of big data environments involving a mix of structured, unstructured and semi-structured data.

As Hadoop has matured over the last decade, it has proven to be a reliable and popular platform among developers requiring a technology that can power large, complex applications. However, Hadoop components and distributions are innovating very quickly and in many different ways. It is being widely adopted. We believe that by cooperating in ODPi we can increase the accelerating rate of adoption by lowering the barrier to entry.

How ODPi Fits Into the Big Data Ecosystem

ODPi was created to complement the Apache Software Foundation and help companies use Apache Hadoop more effectively. Our main focus as we go forward with the first release of our Runtime Spec is being able to write applications that sit on top of big data stacks. Our aim is to develop a cross-industry standards that enables developers to easily write these applications and ensure that they interoperate across systems.

[NOTE] We recently published the ODPi Runtime Specification and test suite to ensure applications will work across multiple Apache Hadoop® distributions.

Thus far, ODPi has focused on core Hadoop technologies including HDFS, MapReduce and YARN for its Runtime Spec and Ambari for the Operational Spec. Over time we look to expand this to projects including HIVE, Spark, and more, for a holistic offering that covers what diverse companies are using and that is driven by the industry needs that our members see in the ecosystem.

Screen Shot 2016-03-24 at 1.12.25 PM.png

Are we open to other management tools outside of Ambari? Alan Gates, Co-founder at Hortonworks responded to this question from an audience member, noting that as of yet “there is not conformity of opinion yet about the answer.” However, we work closely with Ambari and in that regard, and as Gates continues “this is where ODPi complements Apache instead of competes. What we are are trying to do is say ‘here’s how to use this well’ and when people have new features or whatever they want in Ambari, we help them feed that back upstream into the Apache community. We don’t try to push that onto the Apache community.”

If Ambari doesn’t have a crucial feature that a company needs when they are looking at joining, ODPi helps share that feedback upstream to Ambari and works to get that feature included. However, other distros may choose to not manage their distros with Ambari – our members can be certified on the Runtime Spec without working with Ambari.

It is important to note, according to Roman Shaposhnik, founder of Apache Bigtop and director of Open Source at Pivotal, that there is “a pretty good upside to being comparable with how Ambari manages your applications because you want to benefit from as many applications that exist within the Hadoop ecosystem. If those applications all get to be managed by Ambari and you start managing them with something else, well then you have to go back to the application data” and work on ensuring interoperability.

Screen Shot 2016-03-24 at 1.12.47 PM.png

Shaposhnik continues, highlighting that he thinks “that there is enough of an upside to make it interesting to application developers, but whether we see other management tools falling into the footsteps of the Spec, and becoming more like Ambari, I hope we will, but there is no guarantee.”

What Kind of Companies Will Benefit from Standardization?

Companies across the Hadoop and big data ecosystem will greatly benefit from standardization. The panel covered several of these types of companies, including:

  • ISVs and SIs: Depending on the vendor, Hadoop features can vary significantly. A huge benefit ODPi offers for ISVs and SIs, as we produce specifications that touch on configuration etc., is that they can go to any ODPi-certified vendor and know that their application will work. They can test against the ODPi distribution and have confidence that it will work against the whole stack.

  • End Users:  Hadoop upgrades are a struggle for many companies. For example, there are a lot of Internet-scale companies that maintain Hadoop operations either directly or in their operations. By joining ODPi, It would be very beneficial for these companies to externalize their use case through the test and validation aspects of what ODPi is working on; especially if they are using an obscure Hadoop feature or a legacy system, a unique situation where Hadoop supports it, but they are one of the few that actually enable it. Unfortunately, unless they bring this information to us, we can’t help them. These companies don’t have to join ODPi since we are open to all, but if as a member and part of our User Advisory Board, they can help us integrate to our workflow, which would speed up how fast they could rev up their internal distributions of Hadoop.

ODPi 2016 Release Schedule and Roadmap

By Blog

Forrester Research’s big data analysts recently stated in a new research report, Forrester Wave™: Big Data Hadoop Distributions, Q1 2016, that adopting Hadoop is “mandatory” for any organization that wishes to do advanced analytics and get actionable insights on their data. Yet Forrester also estimates that between 60% and 73% of data that enterprises have access to goes unused for business intelligence and analytics.

With application developer and delivery professionals adopting Hadoop “en masse,” the analyst firm predicts that 100% of all large enterprises will adopt Hadoop and related technologies such as Spark for big data analytics within the next two years.

With such industry need for Hadoop, ODPi is looking forward to helping make adoption and interoperability easy for enterprises in the Hadoop ecosystem. We just announced our first release and have a strong release schedule and roadmap for 2016.

ODPi Release Schedule

  • March 31: The first release of the ODPi Runtime specification and test suite, which will ensure applications work across multiple Apache Hadoop® distributions.
  • June/July: The first release of the ODPi Operational specification, run on Ambari, to help enterprises improve installation and management of Hadoop and Hadoop-based applications.
  • September: The second release of the ODPi Runtime specification and test suite.

Screen Shot 2016-03-18 at 4.50.36 PM.png

ODPi Roadmap

  • Establish user advisory council with six end user members.
  • Ensure that three Hadoop vendor members ship an ODPi Runtime compliant distributions.
  • Enable ISVs to indicate their compatibility with the ODPi reference implementation.
  • Include member-confirmed projects in next release.
  • Plan certification programs to avoid splintering the code. A roadmap beyond one-size-fits-all certification and solving future customer challenges today.

We look forward to Hadoop distribution vendors becoming become ODPi Runtime Compliant. Here’s how the process works.

Runtime Spec Compliance and Certification

  • ODPi Runtime Compliance is achieved through self-certification by Hadoop Distro vendors, which takes a few days tops.
  • The ODPi Test Framework is based on Apache Bigtop.
  • Bigtop is an Apache Software Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components. Bigtop supports a wide range of components/projects, including, but not limited to, Hadoop, HBase and Spark.

Screen Shot 2016-03-18 at 4.59.27 PM.png

  • BONUS: Over half the code in the latest Bigtop release originated in ODPi.
  • All ODPi Runtime Compliance tests are linked directly to lines in the ODPi Runtime Specification.
  • To assist with compliance, in addition to the test suite, ODPi also provides a reference build.

Become a member of ODPi; Visit and scroll to the bottom or get involved with the ODPi project on GitHub by signing up for our mailing list or sending pull requests. Membership is not a requirement to become involved with ODPi’s technology and all development is done in the open. Visit

Our Week at Strata + Hadoop San Jose

By Blog

Last week was a sensational week for ODPi. We announced our first release – ODPi Runtime Specification – receiving positive feedback and participated inStrata + Hadoop San Jose via our member booths.

Here’s a look back:

Our First Release – ODPi Runtime Specification

On Monday, March 28th we announced the first release of the ODPi Runtime Specification and test suite to ensure applications will work across multiple Apache Hadoop® distributions. This announcement generated a flurry of media coverage from CMSwire, ZDnet, SiliconANGLE, Datanami, Data Informed,ITBusinessEdge, CIO, InfoWorld, The Register, Computer Business Review, The New Stack, and Fierce Big Data to name just a few.

Some of our favorite quotes include:

  • “ODPi could be a game-changer for enterprises that plan to embark on big data implementations because they won’t have to spend their precious time and resources worrying whether one Hadoop leveraging solution or application will be compatible with the next and the next because they’ll all be based on the same core.” – Virginia Backaitis, CMSwire
  • “One major Hadoop distributor reputedly employs 40 people just to ensure compatibility among products. By standardizing the Hadoop stack, the ODPi hopes to boost compatibility, cut down on complexity and reduce the need for testing, which are becoming big problems that threaten to slow adoption of the platform.” – Alex Woodie, Datanami
  • “Backed by Hortonworks, IBM, GE and the Pivotal arm of EMC, Mertic says ODPi is designed to complement the effort of the Apache Software Foundation that is charged with developing the raw bits that make up Hadoop and related open source projects such as YARN and HDFS.” – Mike Vizard, ITBusinessEdge

In addition, several members took to the blogging world and expressed support for the runtime specification:

The conversations did not stop there. Twitter was buzzing with views on the runtime specification. Here are some of our favorites:

creen Shot 2016-04-06 at 10.15.13 AM.png


Strata + Hadoop San Jose

We joined many of our members at Strata. Companies like Altiscale, DataTorrent, EMC, Hortonworks, IBM, Infosys, Pivotal, SAS, Unifi, and VMWare all proudly displayed their ODPi membership at their individual booths. Here are some of our favorite member moments:


Overall, it was a wonderful week. We are looking forward to Apache: Big Data, May 9-12 in Vancouver, BC. Hope to see you there!


By Blog

Who We Are

ODPi is a nonprofit organization committed to simplification and standardization of the big data ecosystem with a common reference specification called ODPi Core.

As a shared industry effort and Linux Foundation project, ODPi is focused on promoting and advancing the state of Apache Hadoop® and big data technologies for the enterprise.

The rapid influx of digital information available to enterprises has resulted in a big data ecosystem that is challenged and slowed by fragmented, duplicated efforts. ODPi’s members aim to accelerate the adoption of Apache Hadoop and related big data technologies with the goal of making it easier to rapidly develop applications through the integration and standardization of a common reference platform called the ODPi Core.

Where We Are Today

  • ODPi currently has 26 members and more than 35 maintainers from 25 companies dedicated to its ongoing work.

  • Membership investments nearly doubled since ODPi was announced in February 2015.

  • Open to all with a very low hurdle for any developer or company to participate and have an impact.

Key Points

  1. ODPi provides cross-compatibility between different distributions of Hadoop and big data technologies.

    1. ODPi Core specifies how Apache components should be installed and configured and provides a set of tests for validation to make it easier to create big data solutions and data-driven applications.

    2. ODPi Core is not a distribution, it’s an industry standard deployment model over which the industry can build enterprise-class big data solutions.

  2. The fragmented Hadoop market Increases ISV costs, reduces innovation and makes delivering business value harder. ODPi, by solving these problems, fills a gap in the big data ecosystem.

    1. To overcome the interoperability and fragmentation challenges this industry faces, it will take all of us working together. Linux is a great example of how open source can speed innovation and market transformation – that’s what we’re doing at ODPi.

    2. Organized to support the ASF, ODPi promotes innovation and development of upstream projects like Hadoop and Ambari.

    3. The now 10 years old Hadoop has become a mature technology that serves hyperscale environments and is able to handle a wide varying amount and type of data. It’s a proven and popular platform among developers requiring a technology that can power large, complex applications.

    4. Yet, Hadoop components and Hadoop Distros are innovating very quickly and in many different ways. This diversity, while healthy in many ways, also slows big data ecosystem development and limits adoption.

    5. The industry now needs more open source-based big data technologies and standards so application developers and enterprises are able to more easily build data-driven applications.

  3. The ODPi Core removes cost and complexity to accelerate the development of big data solutions.

    1. ODPi helps the three key ecosystem players:

      1. Hadoop Platforms (distros): ODPi complaint guidelines that enable ODPi-compatible software to run successfully on their solutions. The guidelines allow providers to patch their customers in an expeditious manner to deal with emergencies.

      2. ISVs/SIs: ODPi compatibility guidelines allow them to “test once, run everywhere,” eliminating the burden and cost of certification and testing across multiple distributions. They can have a predictable release cadence to reduce maintenance and support costs.

      3. Enterprises (end users): Ability to run any “ODPi-compatible” big data software on any “ODPi-compliant” platform and have it work.

  4. ODPi will bring value to the market by:

    1. Standardizing the commodity work of the components of an Hadoop distribution

    2. Providing a common platform against which to certify apps, reducing the complexities of interoperability

    3. Ensuring a level of compatibility and standardization across distribution and application offerings for management and integration

FAQ: Project Scope and Roadmap

Q: How is testing administered? What is the process for becoming ODPi compliant?  

A: Testing is self-administered currently. To become ODPi compliant, vendors must submit test results for the product release they would like certificated. They do not have to comply with every specification for every product release.

This GitHub repository is where vendors can commit their ODPi spec test runs to let others know when their distro is compliant. Instructions on how to report self-certification are also included.

Q: How long is the testing process to become ODPi-certified?

A: The specification has just become available, but several members that have been planning to do the validation believe running the tests will take only 20 minutes, making a 1-2 day effort at most overall.

Q: Can you explain the ODPi release cycle?

A: ODPi will continue developing the Runtime Specification with updated releases coming every six months. After the March release, expect another in October 2016. The ODPi Operations Specification 1.0 is expected late this summer.

Q: When will the Operations Specification be published?

A: The ODPi Operations Specification is the other piece of the ODPi Core puzzle.  It will help improve installation and management of Hadoop and Hadoop-based applications and will be available in late summer.  The Operations Specification covers Apache Ambari, the ASF project for provisioning, managing, and monitoring Apache Hadoop clusters.

Q: How does ODPi compliment Apache Software Foundation (ASF)?

A: The Apache Software Foundation supports many rapidly growing open source projects. Complementary, ODPi, a shared-industry organization, is solely focused on easing integration and standardization within the Hadoop ecosystem.

ODPi is also contributing to ASF projects in accordance with ASF processes and Intellectual Property guidelines. ODPi will support community development and outreach activities that accelerate the rollout of modern data architectures that leverage Apache Hadoop. For example, ODPi is also contributing back to projects like Ambari and BigTop with more than half the code in the latest release of BigTop coming from ODPi.

Q: How do I get involved?

A: Membership is not a requirement to become involved with ODPi’s technology as all development is done in the open. Visit Get involved with the ODPi project on GitHub by signing up for our mailing list, sending pull requests or or giving us feedback at Our governance model offers one member one vote equality.

Q: How is ODPi governed and managed?

A: OPDi runs under an open governance model that offers one member one vote equality. This ensures our members bring a balanced representation of the big data ecosystem with a perspective and expertise well beyond Hadoop.

Q: What is the role of The Linux Foundation with ODPi?

A:  ODPi is a Linux Foundation project that is independently funded. It harnesses the power of collaborative development to fuel innovation across the big data ecosystem. By aligning with The Linux Foundation, ODPi is able to leverage best practices for community governance, operations and development the organization established running Linux.

Member Companies Ampool, DataTorrent, Hortonworks, Linaro, Pivotal and SAS Share Thoughts on ODPi

By Blog

We had a chance to catch up with some of our member companies ahead of Strata + Hadoop San Jose and ODPi’s first release of the ODPi Runtime Spec and Test Suite to hear what their plans are for the show, why they joined ODPi and what they are looking forward to most on ODPi Core.

Here’s what Ampool Founder and CEO, Milind Bhandarkar, Ph.D, Architect and Co-founder; DataTorrent Architect, Thomas Weise, who is also a member of Apache Apex (incubating) PPMC; and Director of the Linaro Enterprise Group (LEG), Martin Stadtler, had to say.

1. How do you anticipate ODPi changing the big data ecosystem?

Ampool: Standardization is very helpful and powerful in a technology ecosystem, as it provides compatibility. Standardization on how the various components interact with each other and having standard APIs will give a boost to the entire ecosystem, as it will let us focus on our product rather than spending a lot of time on ensuring compatibility.

A startup has to look at ease of use and faster time to delivery in the larger big data ecosystem, without worrying about different version compatibility. ODPi will help reduce the complexity and set-up time associated with ensuring compatibility. ISVs can now focus time on the business problems of their customers. We can innovate rapidly! This will help make Hadoop consumable for enterprises. ODPi is an essential step in the right direction.

DataTorrent: We expect it to simplify developing and testing applications that work across distros and hence lower the cost of building Hadoop based big data applications. DataTorrent, for example, can certify RTS installation and runtime for ODPi and know it will work with multiple platform providers.

Linaro: Removing fragmentation, and allowing the big data application vendors to focus on innovating their offerings and not needing to port to various Hadoop releases.

2. What are the aspects of the ODPi Core that you are looking forward to most for your company?

DataTorrent: Standardized runtime environment with core dependencies YARN and HDFS. Test/certify once, install and run on any ODPi compliant distro.

Linaro: Standardization of Hadoop based on ODPi, optimized for the ARMv8 (64-bit) ecosystem.

Hortonworks Sr. Director of Alliance and Partner Marketing, Robin Liong, shares his thoughts on ODPi Core and HDP Hadoop Core.

Pivotal Software Data Evangelist, Jeff Kelly, shares his thoughts on ODPi To Cap Its First Year with Runtime Specification for Apache Hadoop.

3. Why did you join ODPi?

Ampool: I have been involved in the big data ecosystem for more than 10 years. When I was at Yahoo as an engineer, I had a chance to work on a project that became Hadoop in 2006. It was a small project deployed on 20 machines at the time, but when I left the company in 2010 it was running on 45K machines. Hadoop truly had become the backbone of Yahoo’s data infrastructure.

Over the last six years, there have been a lot of projects that have emerged, and become popular in the Hadoop ecosystem like Hive, Hawq, etc. Given the momentum of these different additions, integration quickly became an important need in the Hadoop ecosystem. Customers started paying attention to integration ability. There are multiple Hadoop distributions with different versions and the APIs were not compatible. For an ISV who wanted to build o -top of this ecosystem, it became challenging and time consuming to test each product for compatibility with all of the different distributions and versions out there — 21 in total.

Ampool wanted to join a project that would help standardize Hadoop to alleviate the challenge for ISVs by reducing testing overhead. As a startup, we do not have the manpower to do this, so we wanted to increase our compatibility for our customers.

DataTorrent: DataTorrent has been committed to cross-distro. compatibility from the start. We understand the challenges and effort involved supporting incompatible platforms and their fine nuances from our own experience. We are looking forward to ODPi making this easier.

Linaro: Linaro is focused on enabling open source technologies on the ARM platform. It is important to ensure multiple architectures have equal support in the data center to provide a choice to big data users.

SAS Vice President of Platform R&D, Craig Rubendall, shares his thoughts on Why SAS joined the Open Data Platform (ODP) initiative.

3. What advice would you give to ISVs looking to join ODPi?

Ampool: ODPi is structured in such a way that platinum members do not run everything. Every voice, even from small startups like Ampool, is heard and your time investment toward this standardization is not superseded by big companies. You have input and are getting in on the ground floor of this effort. It is one member, one vote. I encourage ISVs to get involved early so they can make an impact on this whole effort.

DataTorrent: It is important that ISVs actively participate in the initiative as there are many unique perspectives and broad input is needed to achieve the goals.

Linaro: Invest effort into achieving the shared goals that all members can benefit from, as our members do within Linaro for the ARM ecosystem.

4. How do you see ODPi working with ASF?

Ampool: ODPi will take the innovation coming from Apache and help standardize the projects in the Hadoop ecosystem. It is complementary to the developer ecosystem.

5. What can people at Strata expect to see from your company at the show?

Ampool: We started building our product in August 2015, and we have been conducting a few POCs with it. At Strata, we will be announcing the availability to Hadoop customers and demonstrating the solution on top of ODPi-member distributions including Hortonworks Data Platform.

Linaro: Parity of processor support between ARM and x86 for Hadoop based on ODPi specification.

ODPi’s First Release is Here!

By Blog

Since its formation in February 2015, ODPi has been working hard to deliver an industry standard for Apache Hadoop and big data technologies.

We’re pleased to announce the Release Team published the Runtime Specification, development tests and deployment sandbox. This is a big step toward reaching our goal to simplify and standardize the big data ecosystem with a common reference specification called ODPi Core.

In this recent video panel, several founding members of ODPi discussed our goals, more details on the new Runtime Specificiation and our future roadmap. Let’s take a look at some of the highlights.

Technical Milestones

In recent months, ODPi has reached several new technical milestones. The Release Team has built out a full infrastructure to include continuous integrations builds, and continuous deployment of the ecosystem onto clusters. The team has also started integrating these capabilities into some of the tests, and will soon be able to plugin the certification test suite. As soon as new builds with new functionality are completed, the pipeline will be much more streamlined, leading to faster release cycles.

As mentioned in our previous post, we’ve also come up with a more solid release cadence with updates every six months, which puts our next release in the fall timeframe.

ODPi Core

Before jumping into the Runtime spec, let’s take a look at the benefits of ODPi Core. The ODPi Core is a set of software components, a detailed certification, and a set of open source tests that the industry can use to build big data solutions and data-driven applications.

The ODPi Core will deliver the following benefits:

  • For Hadoop technology vendors, reduced R&D costs that come from a shared qualification effort

  • For big data application solution providers, reduced R&D costs that come from more predictable and better qualified releases

  • Improved interoperability within the platform and simplified integration with existing systems in support of a broad set of use cases

  • Less friction and confusion for enterprise customers and vendors


The ODPi Core includes the ODPi Runtime specification published today, as well as the ODPi Operations specification that is in development.

ODPi Runtime Spec Compliance and Certification

The ODPi Runtime specification and test suite, available today, ensure applications can work across multiple Hadoop distributions.

This Runtime spec can be used by ISVs to properly build their software, and will be used as the basis of a test suite that can be used by ODPi-compliant platform providers to test compliance. The test suite includes a minimum set of basic capabilities that the platform must support in order to claim ODPi compatibility.

The Runtime spec and test suite will provide benefits for users of all types:

  • Hadoop Platform providers: ODPi complaint guidelines that enable ODPi-compatible software to run successfully on their solutions. The guidelines allow providers to patch their customers in an expeditious manner to deal with emergencies.

  • ISVs: Provide a mechanism for to allow them to “test once, run everywhere.” Any software or application successfully tested against one ODPi-compliant platform will work with any ODPi-compliant platform, reducing the burden of building and supporting different versions of software and applications for different Hadoop platforms.

  • Enterprises: Ability to run any “ODPi-compatible” big data software on any “ODPi-compliant” platform and have it work.


Overall, the spec provides predictability to know what’s being delivered and eliminates the need for creating a number of customizations. If there’s a problem, it can also help to diagnose where it is by testing certain parts of an infrastructure.

ODPi Operations Specification

The ODPi Operations specification is the other piece of the ODPi Core puzzle. This specification covers management requirements and guarantees for consumers, ISVs, and service developers, while developing custom service specification and views. An application can be a custom service that can then be managed by Ambari.

The management requirement comes from the fact that applications have multiple runtime components, and there needs to be some standard in place regarding how to manage all of them. The spec aims to capture a set of requirements that a component under ODPi, or any application, should provide or specify so it can be managed by Ambari or a similar tool.

Today, many new applications sit on top of a basic big data stack. Many also live in the data center alongside several other services. Our goal with the Operations spec is to create a form of standards to make it easy to write these applications and to have them easily interoperate. For developers, this will mean they can look at a management spec, write code, and then run the application.

We are currently working to build this to run on Ambari, but may eventually expand to include additional management tools. As previously mentioned, to become OPDi-compliant a vendor does not need to comply with both the ODPi Runtime specification and test suite and ODPi Operations Specification. We understand that it may make sense for some vendors to have just their APIs compliant.

ODPi Update: Core Specifications and Vendor Compliance

By Blog

ODPi has been working on accelerating the delivery of business outcomes through big data solutions by driving interoperability on an enterprise-ready core platform. Our members represent all key personas in big data ecosystem (big data vendors, ISVs, system integrators and end users) and bring real-world big data experience to the table and are working to align enterprise demands with the developer community.

screen_shot_2016-03-17_at_3.26.07_pmIn this recent video, Roman Shaposhnik, Director of Open Source at Pivotal, discusses the latest information on the Open Data Platform Initiative (ODPi) for developers and users of big data technologies from Apache.  He provides answers to many questions that developers, vendors and users in the industry have asked about our mission and future plans.

Highlights from the video follow.

Did you ever have trouble migrating a big data application or have an application that had to be re-written? Did you ever wish instructions were written down to help you with this? This is exactly what we are trying to standardize and provide.

So what exactly is ODPi creating? Our solution is based on three pillars:

  • ODPi Core Specification: Formal document laying out how the Hadoop distribution should behave.
  • ODPi Reference Implementation: Created in support of the specification – a distribution of Hadoop that can be downloaded, installed and used.
  • ODPi Validation Test Suite: A glue that guarantees that a distribution of Apache Hadoop® from a compliant vendor or an end-user deployment of Hadoop complies with the ODPi defined specifications.

What is ODPi core to begin with?

  • ODPi Core specifies how Apache components should be installed and configured and provides a set of tests for validation to make it easier to create big data solutions and data-driven applications.
  • ODPi Core is not a distribution, it’s an industry standard deployment model over which the industry can build enterprise-class big data solutions.
  • ODPi Core initially focuses on Apache Hadoop (inclusive of HDFS, YARN, and MapReduce) as well as Apache® Ambari, and will incorporate additional open source projects in the future.

ODPi Self-Certification Reports

To become ODPi compliant, vendors must submit test results for the product release they would like certificated. They do not have to comply with every specification for every product release.

This GitHub repository is where vendors can commit their ODPi spec test runs to let others know when their distro is compliant. Instructions on how to report self-certification are also included.

API and Operational Compatibility  

Both APIs and operational characteristics are important, but are separate certifications. The first certification level with ODPi is the Runtime Specification, or API compliance, which as a vendor you will be able to guarantee that your API looks exactly like Hadoop.

The second stage of certification, if the vendor so chooses to pursue, is the operational characteristics layer. A vendor will be able to guarantee that its management characteristics are exactly the same as Hadoop. This is an extra level of compliance, but the two are not mandatory. Both certifications, either together or separate, will open up a whole new dimension of things vendors will be compatible with.

As mentioned, some versions can be ODPi compliant and other versions may not. There will be no mandate that vendors must be compatible throughout to use the brand name of ODPi compliant; however, vendors must clarify which versions are compliant for end users.

How To Get Involved

Get involved with the ODPi project on GitHub by signing up for our mailing list, sending pull requests or or giving us feedback at Membership is not a requirement to become involved with ODPi’s technology and all development is done in the open. Visit

ODPi also maintains a fully open and transparent pipeline for continuous integration (CI) of the artifacts that are produced as part of reference implementation. Visit

ODPi delivers shared infrastructure and open governance to accelerate new releases and participation. We are a diverse and passionate ecosystem of members driving this process. If you are passionate about the mission of ODPi, considering joining us. Visit and scroll to the bottom.

In our next blog, we’ll take a closer look at the ODPi road-map and release schedule.

ODPi: The Open Ecosystem of Big Data – Update and Next Steps

By Blog

As it’s been a while since we updated everyone on our progress, we thought it would be appropriate to share what ODPi has been up to over the past several months. In upcoming blogs, we will preview some exciting deliverables that will be coming out the end of March.

ODPi’s journey to Today can be thought of as passing through the following four phases:

  1. Problem Recognition
  2. Industry Coalescence
  3. Getting Organized
  4. Getting to Work

In the rest of this blog, I’ll describe each of these phases.

Problem Recognition

If they’re being honest, Hadoop and Big Data proponents recognize that this technology has not achieved its game-changing business potential.

Gartner puts it well: “Despite considerable hype and reported successes for early adopters, 54 percent of survey respondents report no plans to invest [in Hadoop] at this time, while only 18 percent have plans to invest in Hadoop over the next two years,” said Nick Heudecker, research director at Gartner. “Furthermore, the early adopters don’t appear to be championing for substantial Hadoop adoption over the next 24 months; in fact, there are fewer who plan to begin in the next two years than already have.” – Gartner Survey Highlights Challenges to Hadoop Adoption

The top two factors suppressing demand for Hadoop according to Gartner’s research are a skills gap and unclear business value for Hadoop initiatives. In ODPi’s view, the fragmented nature of the Hadoop ecosystem is a leading cause for businesses’ difficulty extracting value from Hadoop investments. Hadoop, its components, and Hadoop Distros, are all innovating very quickly and in different ways. This diversity, while healthy in many ways, also slows Big Data Ecosystem development and limits adoption.

Specifically, the lack of consistency across major Hadoop distributions means that commercial solutions and ISVs – precisely the entities in the value chain whose solutions deliver business value – must invest disproportionately in multi-distro certification and regression testing. This increases their development costs (not to mention enterprise support costs) and suppresses innovation.


Figure 1: Fragmented Hadoop Market Increases ISV Costs, Reduces Innovation and Makes Delivering Business Value Harder


Industry Coalescence

In February 2015, forward-thinking Big Data and Hadoop players at companies as diverse as AltiScale, Capgemini, CenturyLink, EMC, GE, Hortonworks, IBM, Infosys, Pivotal, SAS and VMware decided to work together to address the underachieving industry growth rate through standardization.

Here’s what some of the founding members had to say when they joined:

  • CapGemini: One of the most consistent challenges that we’ve come across is the need to get multiple vendors’ technologies working together. Sometimes this is to get IBM’s data integration or analytics technologies working on another vendor’s distribution, such as Pivotal or Hortonworks, or to run other vendors analytics tools, such as SAS, on top of an IBM Big Data platform.
  • Hortonworks: Some might look at Pivotal and IBM and others as competitors. We have to set those differences aside and focus on the things we can do jointly. That’s what this initiative is about. It just comes from working together and building trust and we’re used to that. It’s really what open source is about. If you look at the Hadoop industry, there are shared name components. There are varying versions of those components that have different capabilities, different protocols and API incompatibilities. What this effort is aimed at is a stable version of those, so that takes the guesswork out of the broader ecosystem.
  • IBM: One desired outcome of the ODP is to bring new big data solutions to the market more quickly. This will be achieved by making it easier for the ecosystem vendors to enable and test on a well-defined common Hadoop core platform.
  • SAS: SAS is not in it to choose sides on Hadoop distribution vendors.  We support all five major distributions — Cloudera, Hortonworks, IBM, MapR and Pivotal — with our applications, and requests continue to pour in for more support of region-specific distributions. SAS will continue our collaboration with all Hadoop vendors.

Anyone else working with multiple distributions of Hadoop will understand the challenges invovled. Here are three revealing examples from the last few months, each from a different (unnamed) vendor:

  • Calling an HDFS API to see if an HDFS directory exists.  Some don’t throw an exception and return a null for the directory.  Some throw an exception.
  • Setting a baseline of Hive 13 so we get access to some new syntax. Try it on one, it works great and we are able to do some really innovative stuff.  Try it on another that says it also has Hive 13, and we get “syntax error”?
  • Trying to be a good ecosystem citizen, ie. Anyone else working with multiple distributions of Hadoop will understand the challenges involved. Here are three revealing examples from the last few months, each from a different (unnamed) vendor: veraging the HCAT APIs for accessing shared metadata.  All is good. Get the latest “dot” release from the vendor, and guess what, they changed the package name of the class used to get the information. Code change necessary.

Getting Organized

In September of 2015, the decision was made to move under The Linux Foundation. The Linux Foundation’s recognized excellence in Open Source governance and community and ecosystem development provided existing and prospective ODPi members with the confidence that the organization would continue to grow while operating openly, equitably and transparently.

Coinciding with the move to the Linux Foundation, several prominent industry players joined ODPi, bringing the total membership to 25 by the end of 2015.

The Linux Foundation facilitated the bylaws and organization of ODPi in order to draw from all of its members’ diverse areas of expertise in the development of the specification. Figure 2 depicts ODPi’s operating structure.


Figure 2: ODPi Operating Structure

Getting to Work

With the Release Team and TSC in place, the hard work of defining the ODPi Runtime and Operations Specifications got underway in earnest in Q4 2015.

The Release Team published the draft Runtime Specification on January 21 and has been hard at work since then on finalizing the spec and developing tests and a deployment sandbox, which will be announced at the end of March. We will also be publishing more details on the Spec and tests, so check back soon!

Social Media Auto Publish Powered By :