All Posts By

John Mertic

ODPi Takes Hadoop Summit: San Jose

By | Blog

The much anticipated Hadoop Summit is next week, June 28-30 in San Jose, and is one of the Apache Hadoop community’s premier events. It will feature community-chosen technical sessions, business tracks based on real-world implementations and so much more from Hadoop users, developers and ISVs.

As an organization committed to an open ecosystem of big data, we are thrilled to be exhibiting at Hadoop Summit!

During the show, ODPi will be hosting a meetup open to the public. Whether you are attending Hadoop Summit or live in the San Jose area, be sure to join us:

ODPi will host a “War Stories of Making Software Work with Hadoop” meetup! Hear from big data software vendors and application developers who have work with one or more Hadoop distributions about the technical challenges they have faced, and why they feel ODPi will help simplify this for future ISVs. RSVP for our meetup, held on June 27th at 6pm, here!

Be sure to stop by the ODPi booth (#107) too! We hope to see you at this year’s Hadoop Summit event and at the ODPi meetup!

Apache: BigData North America 2016 – Keynoters Perspective

By | Blog

During Apache: BigData North America 2016 last month, both Alan Gates and John Mertic keynoted the conference.

Screen Shot 2016-06-01 at 12.36.32 PM.png

Alan discussed the age of data and data-defined applications, ODPi’s objectives, and what benefits ODPi brings to the larger big data ecosystem.

FullSizeRender.jpg

John spoke about building a stronger Apache Hadoop ecosystem and how the Apache Software Foundation and ODPi complement one another.

Outside of their keynotes, what was their experience at Apache: BigData? Both speakers have shared their key take-aways and insights.

Alan Gates

With my ODPi hat on, I think the best thing about Apache Big Data was the fact that so many people learned about who we as ODPi are and what we do.  Many people I talked to had never heard of us.  And when we explained our goals and what we have done so far they were excited.  And for many who had heard of us but were confused (are we competing with Apache? are we another distribution? why do we even need the ODPi?) we were able to communicate our mission and answer their questions.

From a more personal perspective, I enjoyed the technical level of the sessions.  The sessions I went to had a high level of interesting technical content, with engineers sharing the work they were doing.  The questions I received in the two sessions I presented on Hive showed that the audience was engaged with Hive, how it works, and the changes that the community is making in the project.  It is always interesting to hear what others are working on and to be able to share your work with others who appreciate and understand it.

To learn more on Alan’s perspective, read his recent Hortonworks blog ODPi Helps ASF Accelerate Big Data Ecosystem

John Mertic

This conference was a first for me… not only just the conference in general — I have been to ApacheCon before — but not the Big Data specific section. I was also able to keynote such an event. As any keynoter would say, it is a privilege and an honor first and foremost, but it also gives you a chance to pause and see the event at an entirely different level.

The first thing that stands out is that I got a chance to see Big Data from the innovation creator’s perspective. One project that stood out to me in this regard was Apache Tinkerpop. At its base level, it’s a framework for easily doing graph analysis on compatible sets of data. However, what fascinated me was the conversations around the problems you could solve with it. For example, “What if you could build a car recommendation engine with it, that took into account which car(s) you’ve owned, where you are at in life (age, family, career, location), and information on the vehicles themselves, to recommend what to purchase?” This higher-level thinking brought me back to the software development conversations I had early in my career, and shows how technologists are getting more use-case driven in their approaches.

The other thing that stood out was with all this innovation, there’s still confusion. A stat I heard was that there is a new project coming out of Apache incubation every 6 weeks. That isn’t in and of itself a problem, as I talked with attendees, but the result of that pace of innovation is. The feedback loop is hard to maintain with such an overwhelming firehose. Commercial support dies off quickly after around 15-20 key projects. And the constancy of the Apache Hadoop products that package these project into a canonical distribution tends to not be there. Attendees expressed frustration with the upstream project and distro vendors.

My keynote focused on these two themes, telling the story that innovation is key, but that it isn’t possible without some level of standardization. I made the statement that a stable base is what holds Apache Hadoop back, and based on the conversations I had this definitely is true. Attendees I talked to knew that this wasn’t a barrier to innovation, but an enabler of it. More specifically, they are able to invest more in Apache Hadoop as a unified ecosystem is built around it.

A key takeaway from the week – Apache Hadoop users see the importance of the technology, but are pushed away by the complexity of finding a great way to engage with their use cases. This is where I see ODPi being able to close that loop.

Our Week at Apache: BigData North America

By | Blog

Last week, ODPi participated in Apache: BigData North America in Vancouver, B.C. as an exhibiting platinum sponsor where we announced our gold sponsorship of the Apache Software Foundation.

During the show, we connected with the Apache community by presenting two keynote presentations, holding multiple panel and tutorial sessions, and hosting Apache project office hours at our booth.

Screen Shot 2016-05-13 at 3.03.35 PM.pngScreen Shot 2016-05-17 at 2.18.47 PM.png

Here’s a look back:

ODPi Becomes Gold Sponsor of The Apache Software Foundation

On Wednesday, May 11, we announced our gold sponsor of The Apache Software Foundation (ASF). We joined existing ASF sponsors: Hortonworks, IBM, Pivotal, and WANDisco, who are also ODPi members, to advance open source projects for the big data ecosystem.

“We are pleased to welcome ODPi to the ASF Sponsorship program and their support of Apache big data projects,” said ASF Vice Chairman Greg Stein. “As analysts project that enterprises will fully embrace the Apache Hadoop ecosystem, Sponsor support is even more vital to our success in spearheading industry-leading innovations that are developed in a trusted, community-driven environment.”

This announcement generated positive feedback from the Apache community during the show, as well as many articles – Datanami, Channel Insider, CIO,Linux.com, StorageReview, and IBM DeveloperWorks TV.

Some of our favorite quotes include:

“Both our organizations are laser focused – we want a stronger Hadoop ecosystem,” Mertic said, of the ASF and ODPi. “But let’s be honest, it hasn’t been easy. The Apache project really focuses on those raw components, the peanuts and the chocolate, if you will. But if you’re an end user, you’re looking for that Reese’s Peanut Butter Cup.” – Ian Murphy, Linux.com

“We’re committed to making sure of Apache’s success,” Mertic says. “Our success depends on the upstream project success. Furthermore we’re not looking to do development separate from Apache. We want Apache to own the development process. We want them to own the projects process….We want Apache to keep doing what Apache’s great at, which is building amazing, incubated, governed projects.” – Alex Woodie, Datanami

“John Mertic, director of ODPi, said a group made up of ISVs in the ODPi consortium (which counts IBM, GE, Splunk, EMC, SAS Institute, Infosys and Capgemini among its members) is trying to make sure there are enough checks and balances to prevent Hadoop from fragmenting into a set of incompatible distributions. While each distribution uses the same core base, it’s become apparent that the leading providers of these distributions are creating extensions that could ultimately force ISVs to incur higher costs by having to support applications optimized for a particular instance of Hadoop.” – Mike Vizard, Channel Insider

The conversations did not stop there. Twitter was also buzzing with excitement. Here are some of our favorites:

Screen Shot 2016-05-17 at 12.04.48 PM.pngScreen Shot 2016-05-16 at 11.52.49 AM.png

Screen Shot 2016-05-17 at 12.04.53 PM.pngScreen Shot 2016-05-17 at 12.04.42 PM.png

Keynotes, Panels, and Technical Sessions

Monday

  • Roman Shaposhnik & Konstantin Boudnik from Pivotal discussed How ODPi Leveraged Apache Bigtop to Get to Market Faster (and You Can Too!). Attendees learned how contributions from ODPi members are helping Apache Bigtop get even stronger and provide an integration platform for the next-generation big data technologies.

  • Milind Bhandarkar of Ampool dove into a riveting story Standing on Shoulders of Giants: Ampool Story and also discussed how Ampool is contributing to several ASF projects in ODPi.

  • ODPi panel, ODPi: Advancing Open Data for the Enterprise, gave attendees an overview of how Hadoop distributors, ISVs, SIs, and enterprises (end users) will benefit from standardization. Panel members included (from right to left) Milind Bhandarkar of Ampool, Roman Shaposhink of Pivotal, Alan Gates of Hortonworks, and Susan Malaika of IBM. Moderated by Gregory Chase of Pivotal.

Tuesday

  • Alan Gates from Hortonworks provided a thought-provoking Keynote: ODPi 101: Who We Are, What We Do and Don’t Do. Gates outlined the ODPi Core, a set of software components, a detailed certification and a set of open source tests to make it easier to create big data solutions and data-driven applications.

Screen Shot 2016-05-13 at 3.02.40 PM.pngScreen Shot 2016-05-17 at 2.23.14 PM.png

Wednesday

  • In his keynote, John Mertic from ODPi presented ODPi and ASF: Building a stronger Hadoop ecosystem, where he detailed how the ODPi’s specifications and by-laws reinforce the role of the ASF as the singular place where Apache Hadoop development occurs. He also announced ODPi’s gold sponsorship of ASF during his keynote!

Screen Shot 2016-05-17 at 2.23.27 PM.pngScreen Shot 2016-05-13 at 3.01.41 PM.png

  • John Mertic of ODPi and Jim Jagielski of the Apache Software Foundation teamed up for a riveting conversation about ODPi and ASF Collaboration. Attendees asked questions and learned about how the ASF and ODPi complement each other, where the big data ecosystem is heading, and more.

Apache Project Office Hours

On Tuesday during Apache: Big Data, we hosted Apache project office hours meetings where any project could sign up for 30 min time slots and host their meeting in our lounge. Communities from Geode, Tinkerpop, Spark + RDBMS, Hawq, and MADlis participated.

Screen Shot 2016-05-13 at 3.03.11 PM.png  Screen Shot 2016-05-13 at 3.02.03 PM.png

ODPi Booth Activity

Screen Shot 2016-05-13 at 3.02.26 PM.pngScreen Shot 2016-05-17 at 2.23.49 PM.png

Screen Shot 2016-05-13 at 3.02.11 PM.pngScreen Shot 2016-05-13 at 3.02.54 PM.png

Overall, it was a wonderful week. We are looking forward to Hadoop Summit, June 28-30 in San Jose. Hope to see you there!

Altiscale, Capgemini, IBM, and Unifi Software Discuss the Future of Big Data

By | Blog

Last week we were lucky enough to have some time to sit down with ODPi members Steve Jones, global vice president of Big Data at Capgemini, Mike Maciag, COO at Altiscale, Sean Keenan, co-founder and vice president of products at UNIFi Software, and Todd Moore, vice president Open Technology at IBM, to discuss the future of big data, cognitive computing and how ODPi can help drive innovation. Todd Moore also expanded on his thoughts from his recent blog post on ODPi.

Here is what they had to say:

1. What are the challenges customers face with Hadoop and how will ODPi solve them?

Maciag, Altiscale: ODPi makes it much easier to select a Hadoop platform, select applications that will run successfully on that platform, and switch platform providers in the future, if necessary. Hadoop is actually a large ecosystem of different software projects and each component gets updated and released on a different schedule. As a result, customers today have a tough time making a decision about which platform to pick and application vendors have a hard time making their apps work on multiple platforms. With the ODPi standard, customers have the confidence that they are picking a certified solution with an array of applications that have been made for it and will run successfully. And if you don’t like the support that you’re getting from your platform provider, you can switch with confidence, since your applications will still work. The existence of a standard will allow the Hadoop ecosystem and the applications that run on it to flourish, leading to greater business value for customers.

Jones, Capgemini: The objective of ODPi is to develop a stronger and more innovative ecosystem by providing vendors and companies a firmer and more assured base from which to work from. The challenges have been the poor interoperability  and fragmented initiatives in the market and competing  technical solutions. ODPi makes it easier to deploy big data solutions and data-driven applications for a whole range of use cases, because of the cross-compatibility and standardization of this common reference platform.

Moore, IBM: Compatibility has been a huge issue for customers, partners and application ISVs who write on-top of Hadoop. ODPi will allow developers and organizations to have confidence that a distribution will most likely be able to run the same applications with other distributions that run this same application.

ODPi will drive interoperability and compatibility by providing a common platform against which to certify applications. It also opens up choices for developers by enabling this Interoperability with different distributions within an organization.

The ODPi test framework and self-certification also aligns closely with the Apache Software Foundation by leveraging Apache BigTop for comprehensive packaging, testing, and configuration. More than half the code in the latest Big Top release originated in ODPi.

Keenan, UNIFi: ODPi is providing standards sorely needed within the Hadoop ecosystem. This will enable a more mature, solidified, and hardened stack as vendors continue to adopt this standard. It will remove the barriers of distribution and vendor lock in.

2. What do you see for ODPi in the future?

Maciag, Altiscale: ODPi provides reassurance to our customers that whatever they build on Altiscale will be something that they can use in other places and in other situations. It gives them confidence in using a vendor. It also allows them to run applications in a hybrid environment, for example, using Hortonworks on premises and Altiscale in the cloud. It also allows customers to easily transition from an on-premises environment to a fully cloud one. We are talking to many customers who want to make a transition to the cloud and they are running on a different Hadoop distro today. Since Altiscale is ODPi compliant, they are comfortable that they can easily make that transition.

Jones, Capgemini: The next question for ODPi is what other pieces need to be standardized but also how to move towards a greater degree of verification-based compliance. Certainly as our clients see the tangible proof of how ODPi can reduce complexity and costs, we can expect adoption to accelerate.

Moore, IBM: We expect ODPi to grow. There is significant innovation happening in the industry right now and we like that ODPi will be able to provide a solutions for these challenges. By working together with the other member companies, we can enable organizations to better innovate with Hadoop and other big date technologies thanks to standardization.

Keenan, UNIFi: ODPi has made great strides and is quickly maturing into a standard with a ground swell movement behind it. Our expectation and belief based on these early successes is that this program will continue to see additional vendors and platform providers get on board. Hadoop has won the battle as the Analytic platform of choice, now it is time to build the appropriate standards to accelerate innovation and solution development.

3. Can you share some ways ODPi benefits your customers?

Maciag, Altiscale: The members of ODPi strongly believe that customers and the value that they get from Hadoop should drive the industry, not any particular vendor.  When customers win, the industry and vendors win, too. The release of the runtime spec shows that standards can be established, that the vendors can work together. In the future, I look forward to the expansion of the spec and you’ll start hearing more and more about the real world benefits that this provides.

Jones, Capgemini: ODPi helps customers by enabling an ecosystem of suppliers who can work across multiple Hadoop vendors.  This enables procurement and IT departments to focus on supporting the business ambitions rather than sorting out the technical issues of interoperability and competing technologies.

Moore, IBM: Within the IBM ecosystem, big data plays a tremendous role. As we are beginning to dive deeper into the cognitive world, there has been an explosion of structured and unstructured data feeding into it. Hadoop, Big Data, machine learning, they all plays into how this cognitive world will evolve. So having a platform to count on, starting with a small, specific and consistent packaging model that lives within the ecosystem, is priceless. As people take advantage of this and deploy into their own cloud infrastructure, they will be able to test once and run everywhere. In this way, standardization is an enabler and is a huge deal as we move into the cognitive era. IBM is looking forward to distributing ODPi applications. This will be a key differentiator for IBM to serve customers.

Keenan, UNIFi: Our customers will benefit by reaping the rewards of a more mature data processing platform that they can build their future analytics initiatives on. They also benefit from a Unifi perspective as we’ll be able to spend more resources delivering features, capabilities, and solutions that solve their business problems versus spending cycles supporting all variants of services across a no standards-based processing platform.

4. What advice would you give to others looking to join ODPi?

Maciag, Altiscale: If you’re an application vendor looking to accelerate your business and find more customers, this is the place to be. ODPi certification takes friction out of the sales process and provides peace of mind in a rapidly changing Hadoop ecosystem. By joining ODPi, you get to help influence the standard and know sooner about upcoming developments, so that you can take advantage of them more quickly. The membership of companies today really represents leading thinkers and makers in Hadoop, and so it’s a great group to join.

Jones, Capgemini: Get involved, look at what you want ODPi to do next in practical terms, and join with the ambition of helping to make that happen.

Moore, IBM: You do not have to be a member to join. Goals are self-explanatory, organizations can self-certify and everything can be found on Github. If you are a company, ODPi is a great community and has made significant progress, quickly. Please join us and anticipate in this evolution. It is certainly something Hadoop has needed for years and years.

Keenan, UNIFi: We would encourage participation. This consortium is good for the community and market as a whole and will only create additional value as more members contribute to the program.

5. Why did Capgemini join ODPi?

Jones, Capgemini: Capgemini joined ODPi because we believe in open standards ability to drive innovation.  From our involvement with Open Group, to more technical and sector focused standards effort, we’ve consistently found that standards help drive innovation as well of course bringing the benefits of streamlining and cost efficiency.

ODPi Goes to Apache: Big Data North America!

By | Blog

Apache: Big Data North America is next week, May 9-12 in Vancouver, Canada and will gather a kaleidoscope of big data devotees from the Apache project community to come together to further the education and advancement of Apache open source projects. As an upstream contributor to the Apache Software Foundation and an organization that strives to build upon the innovative work of ASF, we are thrilled to be exhibiting and speaking at Apache: Big Data!

Accompanied by a few of our member companies Ampool, IBM, Hortonworks, and Pivotal, attendees will learn how ODPi will ease integration and standardization for downstream application vendors and end-users that build upon Apache Hadoop®.

The ODPi booth (#11) on the exhibiting floor and will feature a Hacker’s Lounge where attendees can come by and meet industry innovators like Ampool Founder and CEO, Milind Bhandarkar, Ph.D.

As mentioned, there are a number of great ODPi-focused sessions during the show. Here’s an overview of our must-see sessions:

Monday

  • Roman Shaposhnik & Konstantin Boudnik from Pivotal will discuss How ODPi Leveraged Apache Bigtop to Get to Market Faster (and You Can Too!). Attendees will learn how contributions from ODPi members are helping Apache Bigtop get even stronger and provide an integration platform for the next generation big data technologies.
  • There will be an ODPi panel, ODPi: Advancing Open Data for the Enterprise, where attendees will receive an overview of how Hadoop distributors, ISVs, SIs, and enterprises (end users) will benefit from standardization. Panel members include John Mertic of ODPi, Milind Bhandarkar of Ampool, Terrance Yim of CaskData, Alan Gates of Hortonworks, and Susan Malaika of IBM. Moderated by Roman Shaposhink of Pivotal.
  • Milind Bhandarkar of Ampool dives into a riveting story Standing on Shoulders of Giants: Ampool Story and also discusses how Ampool is contributing to several ASF projects in ODPi.

Tuesday

  • Alan Gates from Hortonworks provides a thought-provoking Keynote: ODPi 101: Who We Are, What We Do and Don’t Do. Gates will outline the ODPi Core, a set of software components, a detailed certification and a set of open source tests to make it easier to create big data solutions and data-driven applications.

Wednesday

  • John Mertic from ODPi showcases an important Keynote- ODPi and ASF: Building a stronger Hadoop ecosystem, where he details how the ODPi’s specifications and by-laws reinforce the role of the ASF as the singular place where Apache Hadoop development occurs.
  • John Mertic of ODPi and Jim Jagielski of the Apache Software Foundation team up for a riveting conversation ODPi and ASF Collaboration. Attendees will learn about how the ASF and ODPi complement each other and work together, where the big data ecosystem is heading, and more. Mark your calendar now, you will not want to miss this session!

A supporter of the work the ASF has done with the Apache Hadoop ecosystem since our inception, we are looking forward to supporting, connecting with, and learning from the Apache project community on advancing the big data ecosystem.

We hope you will join us at Apache: Big Data!

Register now for Apache Big Data.

Making the Case for ODPi: How We Are Furthering Hadoop and Big Data Ecosystem

By | Blog

As a Linux Foundation project, we are lucky to benefit from the vast experience the foundation has in dealing with open source projects experiencing rapid growth. With the organization’s support and our 25 members working together, we believe we can increase adoption and open opportunities for innovation on top of an already strong Hadoop community.

With our first release – ODPi Runtime Specification – behind us we thought we’d share insights from several of our founding members on how ODPi fits into the Hadoop and Big Data ecosystem. This panel discussion, featuring executives from Hortonworks, Pivotal, IBM and Reactor8 also covered what kind of companies will benefit from developing new standards across the ecosystem.

Screen Shot 2016-03-24 at 1.13.00 PM.png

How Hadoop Can Help Organizations Manage and Take Advantage of Big Data

Organizations across industries are struggling to manage the massive amounts of data available to them. In response, Hadoop is quickly replacing data warehouse platforms, offering a distributed processing framework designed to address the volume and complexity of big data environments involving a mix of structured, unstructured and semi-structured data.

As Hadoop has matured over the last decade, it has proven to be a reliable and popular platform among developers requiring a technology that can power large, complex applications. However, Hadoop components and distributions are innovating very quickly and in many different ways. It is being widely adopted. We believe that by cooperating in ODPi we can increase the accelerating rate of adoption by lowering the barrier to entry.

How ODPi Fits Into the Big Data Ecosystem

ODPi was created to complement the Apache Software Foundation and help companies use Apache Hadoop more effectively. Our main focus as we go forward with the first release of our Runtime Spec is being able to write applications that sit on top of big data stacks. Our aim is to develop a cross-industry standards that enables developers to easily write these applications and ensure that they interoperate across systems.

[NOTE] We recently published the ODPi Runtime Specification and test suite to ensure applications will work across multiple Apache Hadoop® distributions.

Thus far, ODPi has focused on core Hadoop technologies including HDFS, MapReduce and YARN for its Runtime Spec and Ambari for the Operational Spec. Over time we look to expand this to projects including HIVE, Spark, and more, for a holistic offering that covers what diverse companies are using and that is driven by the industry needs that our members see in the ecosystem.

Screen Shot 2016-03-24 at 1.12.25 PM.png

Are we open to other management tools outside of Ambari? Alan Gates, Co-founder at Hortonworks responded to this question from an audience member, noting that as of yet “there is not conformity of opinion yet about the answer.” However, we work closely with Ambari and in that regard, and as Gates continues “this is where ODPi complements Apache instead of competes. What we are are trying to do is say ‘here’s how to use this well’ and when people have new features or whatever they want in Ambari, we help them feed that back upstream into the Apache community. We don’t try to push that onto the Apache community.”

If Ambari doesn’t have a crucial feature that a company needs when they are looking at joining, ODPi helps share that feedback upstream to Ambari and works to get that feature included. However, other distros may choose to not manage their distros with Ambari – our members can be certified on the Runtime Spec without working with Ambari.

It is important to note, according to Roman Shaposhnik, founder of Apache Bigtop and director of Open Source at Pivotal, that there is “a pretty good upside to being comparable with how Ambari manages your applications because you want to benefit from as many applications that exist within the Hadoop ecosystem. If those applications all get to be managed by Ambari and you start managing them with something else, well then you have to go back to the application data” and work on ensuring interoperability.

Screen Shot 2016-03-24 at 1.12.47 PM.png

Shaposhnik continues, highlighting that he thinks “that there is enough of an upside to make it interesting to application developers, but whether we see other management tools falling into the footsteps of the Spec, and becoming more like Ambari, I hope we will, but there is no guarantee.”

What Kind of Companies Will Benefit from Standardization?

Companies across the Hadoop and big data ecosystem will greatly benefit from standardization. The panel covered several of these types of companies, including:

  • ISVs and SIs: Depending on the vendor, Hadoop features can vary significantly. A huge benefit ODPi offers for ISVs and SIs, as we produce specifications that touch on configuration etc., is that they can go to any ODPi-certified vendor and know that their application will work. They can test against the ODPi distribution and have confidence that it will work against the whole stack.

  • End Users:  Hadoop upgrades are a struggle for many companies. For example, there are a lot of Internet-scale companies that maintain Hadoop operations either directly or in their operations. By joining ODPi, It would be very beneficial for these companies to externalize their use case through the test and validation aspects of what ODPi is working on; especially if they are using an obscure Hadoop feature or a legacy system, a unique situation where Hadoop supports it, but they are one of the few that actually enable it. Unfortunately, unless they bring this information to us, we can’t help them. These companies don’t have to join ODPi since we are open to all, but if as a member and part of our User Advisory Board, they can help us integrate to our workflow, which would speed up how fast they could rev up their internal distributions of Hadoop.

ODPi 2016 Release Schedule and Roadmap

By | Blog

Forrester Research’s big data analysts recently stated in a new research report, Forrester Wave™: Big Data Hadoop Distributions, Q1 2016, that adopting Hadoop is “mandatory” for any organization that wishes to do advanced analytics and get actionable insights on their data. Yet Forrester also estimates that between 60% and 73% of data that enterprises have access to goes unused for business intelligence and analytics.

With application developer and delivery professionals adopting Hadoop “en masse,” the analyst firm predicts that 100% of all large enterprises will adopt Hadoop and related technologies such as Spark for big data analytics within the next two years.

With such industry need for Hadoop, ODPi is looking forward to helping make adoption and interoperability easy for enterprises in the Hadoop ecosystem. We just announced our first release and have a strong release schedule and roadmap for 2016.

ODPi Release Schedule

  • March 31: The first release of the ODPi Runtime specification and test suite, which will ensure applications work across multiple Apache Hadoop® distributions.
  • June/July: The first release of the ODPi Operational specification, run on Ambari 2.2.0.0, to help enterprises improve installation and management of Hadoop and Hadoop-based applications.
  • September: The second release of the ODPi Runtime specification and test suite.

Screen Shot 2016-03-18 at 4.50.36 PM.png

ODPi Roadmap

  • Establish user advisory council with six end user members.
  • Ensure that three Hadoop vendor members ship an ODPi Runtime compliant distributions.
  • Enable ISVs to indicate their compatibility with the ODPi reference implementation.
  • Include member-confirmed projects in next release.
  • Plan certification programs to avoid splintering the code. A roadmap beyond one-size-fits-all certification and solving future customer challenges today.

We look forward to Hadoop distribution vendors becoming become ODPi Runtime Compliant. Here’s how the process works.

Runtime Spec Compliance and Certification

  • ODPi Runtime Compliance is achieved through self-certification by Hadoop Distro vendors, which takes a few days tops.
  • The ODPi Test Framework is based on Apache Bigtop.
  • Bigtop is an Apache Software Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components. Bigtop supports a wide range of components/projects, including, but not limited to, Hadoop, HBase and Spark.

Screen Shot 2016-03-18 at 4.59.27 PM.png

  • BONUS: Over half the code in the latest Bigtop release originated in ODPi.
  • All ODPi Runtime Compliance tests are linked directly to lines in the ODPi Runtime Specification.
  • To assist with compliance, in addition to the test suite, ODPi also provides a reference build.

Become a member of ODPi; Visit www.odpi.org/members and scroll to the bottom or get involved with the ODPi project on GitHub by signing up for our mailing list or sending pull requests. Membership is not a requirement to become involved with ODPi’s technology and all development is done in the open. Visit www.github.com/odpi/specs.

Our Week at Strata + Hadoop San Jose

By | Blog

Last week was a sensational week for ODPi. We announced our first release – ODPi Runtime Specification – receiving positive feedback and participated inStrata + Hadoop San Jose via our member booths.

Here’s a look back:

Our First Release – ODPi Runtime Specification

On Monday, March 28th we announced the first release of the ODPi Runtime Specification and test suite to ensure applications will work across multiple Apache Hadoop® distributions. This announcement generated a flurry of media coverage from CMSwire, ZDnet, SiliconANGLE, Datanami, Data Informed,ITBusinessEdge, CIO, InfoWorld, The Register, Computer Business Review, The New Stack, and Fierce Big Data to name just a few.

Some of our favorite quotes include:

  • “ODPi could be a game-changer for enterprises that plan to embark on big data implementations because they won’t have to spend their precious time and resources worrying whether one Hadoop leveraging solution or application will be compatible with the next and the next because they’ll all be based on the same core.” – Virginia Backaitis, CMSwire
  • “One major Hadoop distributor reputedly employs 40 people just to ensure compatibility among products. By standardizing the Hadoop stack, the ODPi hopes to boost compatibility, cut down on complexity and reduce the need for testing, which are becoming big problems that threaten to slow adoption of the platform.” – Alex Woodie, Datanami
  • “Backed by Hortonworks, IBM, GE and the Pivotal arm of EMC, Mertic says ODPi is designed to complement the effort of the Apache Software Foundation that is charged with developing the raw bits that make up Hadoop and related open source projects such as YARN and HDFS.” – Mike Vizard, ITBusinessEdge

In addition, several members took to the blogging world and expressed support for the runtime specification:

The conversations did not stop there. Twitter was buzzing with views on the runtime specification. Here are some of our favorites:

creen Shot 2016-04-06 at 10.15.13 AM.png

 

Strata + Hadoop San Jose

We joined many of our members at Strata. Companies like Altiscale, DataTorrent, EMC, Hortonworks, IBM, Infosys, Pivotal, SAS, Unifi, and VMWare all proudly displayed their ODPi membership at their individual booths. Here are some of our favorite member moments:

ocialPicture4.png

Overall, it was a wonderful week. We are looking forward to Apache: Big Data, May 9-12 in Vancouver, BC. Hope to see you there!

ODPi FAQ

By | Blog

Who We Are

ODPi is a nonprofit organization committed to simplification and standardization of the big data ecosystem with a common reference specification called ODPi Core.

As a shared industry effort and Linux Foundation project, ODPi is focused on promoting and advancing the state of Apache Hadoop® and big data technologies for the enterprise.

The rapid influx of digital information available to enterprises has resulted in a big data ecosystem that is challenged and slowed by fragmented, duplicated efforts. ODPi’s members aim to accelerate the adoption of Apache Hadoop and related big data technologies with the goal of making it easier to rapidly develop applications through the integration and standardization of a common reference platform called the ODPi Core.

Where We Are Today

  • ODPi currently has 26 members and more than 35 maintainers from 25 companies dedicated to its ongoing work.

  • Membership investments nearly doubled since ODPi was announced in February 2015.

  • Open to all with a very low hurdle for any developer or company to participate and have an impact.

Key Points

  1. ODPi provides cross-compatibility between different distributions of Hadoop and big data technologies.

    1. ODPi Core specifies how Apache components should be installed and configured and provides a set of tests for validation to make it easier to create big data solutions and data-driven applications.

    2. ODPi Core is not a distribution, it’s an industry standard deployment model over which the industry can build enterprise-class big data solutions.

  2. The fragmented Hadoop market Increases ISV costs, reduces innovation and makes delivering business value harder. ODPi, by solving these problems, fills a gap in the big data ecosystem.

    1. To overcome the interoperability and fragmentation challenges this industry faces, it will take all of us working together. Linux is a great example of how open source can speed innovation and market transformation – that’s what we’re doing at ODPi.

    2. Organized to support the ASF, ODPi promotes innovation and development of upstream projects like Hadoop and Ambari.

    3. The now 10 years old Hadoop has become a mature technology that serves hyperscale environments and is able to handle a wide varying amount and type of data. It’s a proven and popular platform among developers requiring a technology that can power large, complex applications.

    4. Yet, Hadoop components and Hadoop Distros are innovating very quickly and in many different ways. This diversity, while healthy in many ways, also slows big data ecosystem development and limits adoption.

    5. The industry now needs more open source-based big data technologies and standards so application developers and enterprises are able to more easily build data-driven applications.

  3. The ODPi Core removes cost and complexity to accelerate the development of big data solutions.

    1. ODPi helps the three key ecosystem players:

      1. Hadoop Platforms (distros): ODPi complaint guidelines that enable ODPi-compatible software to run successfully on their solutions. The guidelines allow providers to patch their customers in an expeditious manner to deal with emergencies.

      2. ISVs/SIs: ODPi compatibility guidelines allow them to “test once, run everywhere,” eliminating the burden and cost of certification and testing across multiple distributions. They can have a predictable release cadence to reduce maintenance and support costs.

      3. Enterprises (end users): Ability to run any “ODPi-compatible” big data software on any “ODPi-compliant” platform and have it work.

  4. ODPi will bring value to the market by:

    1. Standardizing the commodity work of the components of an Hadoop distribution

    2. Providing a common platform against which to certify apps, reducing the complexities of interoperability

    3. Ensuring a level of compatibility and standardization across distribution and application offerings for management and integration

FAQ: Project Scope and Roadmap

Q: How is testing administered? What is the process for becoming ODPi compliant?  

A: Testing is self-administered currently. To become ODPi compliant, vendors must submit test results for the product release they would like certificated. They do not have to comply with every specification for every product release.

This GitHub repository is where vendors can commit their ODPi spec test runs to let others know when their distro is compliant. Instructions on how to report self-certification are also included.

Q: How long is the testing process to become ODPi-certified?

A: The specification has just become available, but several members that have been planning to do the validation believe running the tests will take only 20 minutes, making a 1-2 day effort at most overall.

Q: Can you explain the ODPi release cycle?

A: ODPi will continue developing the Runtime Specification with updated releases coming every six months. After the March release, expect another in October 2016. The ODPi Operations Specification 1.0 is expected late this summer.

Q: When will the Operations Specification be published?

A: The ODPi Operations Specification is the other piece of the ODPi Core puzzle.  It will help improve installation and management of Hadoop and Hadoop-based applications and will be available in late summer.  The Operations Specification covers Apache Ambari, the ASF project for provisioning, managing, and monitoring Apache Hadoop clusters.

Q: How does ODPi compliment Apache Software Foundation (ASF)?

A: The Apache Software Foundation supports many rapidly growing open source projects. Complementary, ODPi, a shared-industry organization, is solely focused on easing integration and standardization within the Hadoop ecosystem.

ODPi is also contributing to ASF projects in accordance with ASF processes and Intellectual Property guidelines. ODPi will support community development and outreach activities that accelerate the rollout of modern data architectures that leverage Apache Hadoop. For example, ODPi is also contributing back to projects like Ambari and BigTop with more than half the code in the latest release of BigTop coming from ODPi.

Q: How do I get involved?

A: Membership is not a requirement to become involved with ODPi’s technology as all development is done in the open. Visit www.github.com/odpi/specs. Get involved with the ODPi project on GitHub by signing up for our mailing list, sending pull requests or or giving us feedback at https://jira.odpi.org. Our governance model offers one member one vote equality.

Q: How is ODPi governed and managed?

A: OPDi runs under an open governance model that offers one member one vote equality. This ensures our members bring a balanced representation of the big data ecosystem with a perspective and expertise well beyond Hadoop.

Q: What is the role of The Linux Foundation with ODPi?

A:  ODPi is a Linux Foundation project that is independently funded. It harnesses the power of collaborative development to fuel innovation across the big data ecosystem. By aligning with The Linux Foundation, ODPi is able to leverage best practices for community governance, operations and development the organization established running Linux. www.linuxfoundation.org

Member Companies Ampool, DataTorrent, Hortonworks, Linaro, Pivotal and SAS Share Thoughts on ODPi

By | Blog

We had a chance to catch up with some of our member companies ahead of Strata + Hadoop San Jose and ODPi’s first release of the ODPi Runtime Spec and Test Suite to hear what their plans are for the show, why they joined ODPi and what they are looking forward to most on ODPi Core.

Here’s what Ampool Founder and CEO, Milind Bhandarkar, Ph.D, Architect and Co-founder; DataTorrent Architect, Thomas Weise, who is also a member of Apache Apex (incubating) PPMC; and Director of the Linaro Enterprise Group (LEG), Martin Stadtler, had to say.

1. How do you anticipate ODPi changing the big data ecosystem?

Ampool: Standardization is very helpful and powerful in a technology ecosystem, as it provides compatibility. Standardization on how the various components interact with each other and having standard APIs will give a boost to the entire ecosystem, as it will let us focus on our product rather than spending a lot of time on ensuring compatibility.

A startup has to look at ease of use and faster time to delivery in the larger big data ecosystem, without worrying about different version compatibility. ODPi will help reduce the complexity and set-up time associated with ensuring compatibility. ISVs can now focus time on the business problems of their customers. We can innovate rapidly! This will help make Hadoop consumable for enterprises. ODPi is an essential step in the right direction.

DataTorrent: We expect it to simplify developing and testing applications that work across distros and hence lower the cost of building Hadoop based big data applications. DataTorrent, for example, can certify RTS installation and runtime for ODPi and know it will work with multiple platform providers.

Linaro: Removing fragmentation, and allowing the big data application vendors to focus on innovating their offerings and not needing to port to various Hadoop releases.

2. What are the aspects of the ODPi Core that you are looking forward to most for your company?

DataTorrent: Standardized runtime environment with core dependencies YARN and HDFS. Test/certify once, install and run on any ODPi compliant distro.

Linaro: Standardization of Hadoop based on ODPi, optimized for the ARMv8 (64-bit) ecosystem.

Hortonworks Sr. Director of Alliance and Partner Marketing, Robin Liong, shares his thoughts on ODPi Core and HDP Hadoop Core.

Pivotal Software Data Evangelist, Jeff Kelly, shares his thoughts on ODPi To Cap Its First Year with Runtime Specification for Apache Hadoop.

3. Why did you join ODPi?

Ampool: I have been involved in the big data ecosystem for more than 10 years. When I was at Yahoo as an engineer, I had a chance to work on a project that became Hadoop in 2006. It was a small project deployed on 20 machines at the time, but when I left the company in 2010 it was running on 45K machines. Hadoop truly had become the backbone of Yahoo’s data infrastructure.

Over the last six years, there have been a lot of projects that have emerged, and become popular in the Hadoop ecosystem like Hive, Hawq, etc. Given the momentum of these different additions, integration quickly became an important need in the Hadoop ecosystem. Customers started paying attention to integration ability. There are multiple Hadoop distributions with different versions and the APIs were not compatible. For an ISV who wanted to build o -top of this ecosystem, it became challenging and time consuming to test each product for compatibility with all of the different distributions and versions out there — 21 in total.

Ampool wanted to join a project that would help standardize Hadoop to alleviate the challenge for ISVs by reducing testing overhead. As a startup, we do not have the manpower to do this, so we wanted to increase our compatibility for our customers.

DataTorrent: DataTorrent has been committed to cross-distro. compatibility from the start. We understand the challenges and effort involved supporting incompatible platforms and their fine nuances from our own experience. We are looking forward to ODPi making this easier.

Linaro: Linaro is focused on enabling open source technologies on the ARM platform. It is important to ensure multiple architectures have equal support in the data center to provide a choice to big data users.

SAS Vice President of Platform R&D, Craig Rubendall, shares his thoughts on Why SAS joined the Open Data Platform (ODP) initiative.

3. What advice would you give to ISVs looking to join ODPi?

Ampool: ODPi is structured in such a way that platinum members do not run everything. Every voice, even from small startups like Ampool, is heard and your time investment toward this standardization is not superseded by big companies. You have input and are getting in on the ground floor of this effort. It is one member, one vote. I encourage ISVs to get involved early so they can make an impact on this whole effort.

DataTorrent: It is important that ISVs actively participate in the initiative as there are many unique perspectives and broad input is needed to achieve the goals.

Linaro: Invest effort into achieving the shared goals that all members can benefit from, as our members do within Linaro for the ARM ecosystem.

4. How do you see ODPi working with ASF?

Ampool: ODPi will take the innovation coming from Apache and help standardize the projects in the Hadoop ecosystem. It is complementary to the developer ecosystem.

5. What can people at Strata expect to see from your company at the show?

Ampool: We started building our product in August 2015, and we have been conducting a few POCs with it. At Strata, we will be announcing the availability to Hadoop customers and demonstrating the solution on top of ODPi-member distributions including Hortonworks Data Platform.

Linaro: Parity of processor support between ARM and x86 for Hadoop based on ODPi specification.