All Posts By

John Mertic

Is Your Data Clean or Dirty?

By Blog

downloadOver the weekend I read an incredible post from SAS Big Data evangelist Tamara Dull. I love her down-to-earth and real life perspectives on Big Data, and your analogy of cleaning the car hit home for me. She is spot on – clean data pays dividends in being able to get better insights.

But, what is clean data? What is that threshold that says your data is clean versus dirty?

Could data even be “too clean”?

(pause to hear gasps from my OCD readers)

Clean data and clean houses

Taking this to a real life example, I can say first hand there are often different definitions of what clean is. For example, my wife is very keen on keeping excess items off our kitchen counters, to the point where she’ll see something that doesn’t belong and put it in the first cabinet or drawer she encounters that has space for it. Me on the other hand, I’m big on finding what I believe is the right place for it. Both of us have the same goal in mind – get the counters clean.

To each of us, there’s value in our approaches – which is efficiency. Hers is optimized at the front end, mine at the back end. However, the end result of each of our “cleaning” could have negative impacts (with my approach, it’s my wife’s inability to find where I put something – with my wife’s method, it’s having items fall out of a cabinet on me as I open it).

Is “clean” to one person the same as everyone?

The life lesson above teaches something critical about data – clean isn’t a cut and dry threshold. And taking a page from Tamara’s post, it’s also not a static definition.

The trap you can quickly fall into is thinking of data in the same terms as you would have looked at structured data. While yes, part of the challenge is to understand what the data is and its relationships, the more crucial challenge is how you intend to consume the data and then use it. This is a shift from the RDBMS thinking of focusing on normalization and structure first and then usage second. With the Big Data-esque ways of consuming and processing data (streaming, ML, AI, IOT) combined with velocity, variability, and volume, the use-case mindset is exactly where your focus should be.

“Use case first approach” is how we look at these technologies at ODPi. We look at questions like “Here is the data I have, and this is what I’m trying to find out – what is the right approach/tools/patterns to use?” and how they can be answered. We ensure all of our compliant platforms, interoperable apps, and specifications have the components needed to enable successful business outcomes. This provides companies the peace of mind that they are making a safe investment, and that switching tools doesn’t mean that their clean data becomes less than optimal to leverage the way they want.

This parallels on the discussion of cleaning in our house – are we trying to clean up quickly because company is coming over, or are we trying to go through an entire room and organize it. Approaching data cleaning is the same thought process.

Is Open Source Big Data a broken promise?

By Blog

An article caught my eye this past week, where Robert Hof of SiliconAngle asserted that the challenges of Apache Hadoop adoption are a byproduct of the open source development approach. Hof argues that the various pieces do not integrate well together and some projects are not living up to their promises, which has resulting in additional work being required by organizations for them to see their true value. This has lead to a small pool of available talent and end-customers that are uncertain about where to direct their investments.

On the heels of this article, I watched the below video from Rakesh Kant of US Bank that I found just as insightful.

His sentiment rings loud and clear:

  • “I’m not seeing any signal, only noise.”
  • “The landscape is evolving into more experiments”
  • “A standard is required to help businesses”
  • “I’d like to focus time on delivering business value”

The Hadoop ecosystem has always been a technology focused one, and its clear this technology has been ground breaking and impactful. However, I do think that, over time, this technology has evolved to solve the needs of technologists. Enterprises have been largely been left without a voice and to struggle to embrace it with confidence.

In my view, open source as a development model is not the problem. Rather, it’s the lack of feedback from end-users like US Bank into the process. ODPi would like to solve this problem and help end-users share their feedback.

If you are an end-user of Hadoop, we’d love to have you as part of our End User Advisory Board to discuss these issues and help us focus on making adopting these technologies less risky for you.

My Experience at Global Big Data Summit: Discussing the Importance of Standards

By Blog

I had a good day last week presenting to the audience at the Global Big Data Summit in Santa Clara. The tail end of the the last day of any conference is a bit slow, but was thrilled when many came barreling in right as I was ready to start working through my slidedeck which spoke to the point of the importance of standards, like ODPi, in driving future investment in Big Data and Apache Hadoop.

I had one critical question after the talk that I thoroughly enjoyed answering. A gentleman pushed back on my point that standards need to be the focus. In his experience, staff training and education were the biggest concerns and it didn’t make sense to focus on standards until a critical mass of developers and practitioners were properly trained first. It was a fair argument, and one that Gartner has found as a key blocker to Apache Hadoop growth as well, but to me one that tries to treat the symptom more than the core issue, and I pushed back saying that standards enables better education and enablement. My point made sense to him, but I walked away wanting to discuss this more in a blog with better data points behind it. After all, we are in the data industry here and should be data driven!

If there is one industry where standards are at the forefront, it’s education. Education standards are a very touchy subject (disclaimer here – I’m a parent of 4 school aged children and good friends with several educators) and while I’ll attempt to steer clear of the execution for this article, the concept of what is trying to be driven makes perfect sense. Does what skills a first grader have in one state equate with another state? What are reasonable benchmarks for defining competency? Can trends in learning/teaching method and outcomes be better correlated?

I came across an interview with a leader in educational standards entitled “How and Why Standards Can Improve Student Achievement: A Conversation with Robert J. Marzano”. The interviewee made some interesting insights which drew parallels to the critical question I received at the talk. Here’s a few quotes from the interview and their relation to Apache Hadoop standards:

“Standards hold the greatest hope for significantly improving student achievement. Every other policy mandate we’ve tried hasn’t done so. For example, right after A Nation at Risk (Washington, DC: U.S. Department of Education, 1983) was published, we tried to increase academic achievement by making graduation requirements more rigorous. That was the first wave of reform, but it didn’t have much of an effect.”

This makes a great point – creating a measuring stick for competency without some sort of standard to base education from hurts more than it helps.

The interviewer goes on to ask about what conditions are needed to implement standards.

“Cut the number of standards and the content within standards dramatically. If you look at all the national and state documents that McREL has organized on its Web site (, you’ll find approximately 130 across some 14 different subject areas. The knowledge and skills that these documents describe represent about 3,500 benchmarks. To cover all this content, you would have to change schooling from K–12 to K–22. Even if you look at a specific state document and start calculating how much time it would take to cover all the content it contains, there’s just not enough time to do it. So step one toward implementing standards is to cut the amount of content addressed within standards. By my reckoning, we would have to cut content by about two-thirds. The sheer number of standards is the biggest impediment to implementing standards.”

Lots and lots of content and things identified to learn across a diverse set of subject areas, with a finite time to turn out individuals competent in the space. Seem similar to the situation in the Apache Hadoop ecosystem?

The interviewer then follows up with asking how can you do this with knowledge continuing to expand.

“It is a hard task, but not impossible. So far the people we’ve asked to articulate standards have been subject matter specialists. If I teach music and my life is devoted to that, of course I’m going to believe that all of what’s identified in the national documents is important. Subject matter experts were certainly the ones to answer the question, What’s important in your content area? To answer the question, What’s absolutely essential? you have to broaden that population dramatically to include all constituents—those with and without college degrees.”

This response aligns very well with the ODPi approach to creating Apache Hadoop standards. We aren’t in the business of creating full end-to-end comprehensive standards of what an Apache Hadoop Platform should offer, or an Apache Hadoop-Native Big Data Application should adhere to, but instead focus on what’s truly important to provide that base level –  those essential pieces for what a platform should offer. And I particularly like the last point “expanding the scope of the conversation around standard to get diverse opinions and experiences,” which is something ODPi is uniquely positioned to drive.

One last quote, which I think shapes the “Why?” on this effort.

“Whether we focus on standards or not, we’re entering an era of accountability that has been created by technology and the information explosion.”

The enterprise has the same expectations – they want to lower risks in Big Data investments, which those risks are a byproduct of not having staff to manage them. Fortune 500 executives need this in place to have any confidence in this technology, which the abysmal adoption rates have shown to be problematic. In short, Apache Hadoop needs to be accountable for its enterprise growth.

ODPi Meetup Recap: “War Stories of Making Software Work with Hadoop”

By Blog

Hadoop Summit is notorious for bringing together everyone who’s anyone in the in the Big Data world – and this year’s event, welcoming more than 4,000 attendees, was no different.


Not only was ODPi able to announce that five Apache™ Hadoop® distributions are officially ODPi Runtime Compliant, but we also hosted a meetup that centered on “War Stories of Making Software Work with Hadoop.”


Successfully migrating big data software to interoperate with one or more Apache™ Hadoop® releases requires unique engineering approaches and streamlined innovation. Our meetup discussed the importance and benefits of certifying compatibility between multiple Hadoop distributions. Those who have navigated this space for years without any true standardization shared their war stories.  

Attendees also heard from ODPi members hailing from big data software vendors and ISVs. The War Stories panel featured insights from Scott Gray, chief architect of IBM’s Open Platform for Apache Hadoop; Vineet Goel, principal product manager of Pivotal HDB & Hadoop at Pivotal; Paul Kent, VP of big data initiatives at SAS; and Smiti Sharma, principal engineer of big data and emerging technologies for EMC. These members have each ported their software to work with one or more Hadoop distributions.

They discussed technical challenges they overcame and why they believe ODPi will help simplify this for both end users and ISVs in the future.

After explaining to the room how their companies are committed to both big data innovation, and how their numerous technologies aid end users, Gray, Goel, Kent, and Sharma then covered off on cross-organizational compatibility within the Hadoop space.


John Mertic’s first question to the panel, “Before the concept of what ODPi is meant to deliver, what were the chief challenges you were running into?” (can be found at the 28:50 mark).

When diving into this question – most of which centered on their experience and the difficulties of supporting multiple, disjointed distributions – the panelists made some insightful statements.

Gray of IBM set the stage for these pain points, noting, “Hadoop evolves at an incredible pace and there’s this never-ending tension between what the customers want… and distros [being] pressed to keep up with this evolution, and we have all these products trying to chase the distribution… It makes it incredibly, insanely expensive… It really was in our best interest to try to put a little sanity into the landscape.”


Goel applauded ODPi’s baseline specifications and explained Pivotal’s arduous journey of taking on a new distribution (around the 34:00 mark). Mertic commented: “I like how you said, ‘If we had the money back from supporting all these distros, imagine the innovation we could have…’ I think that’s a really powerful statement.”

After kicking off an interactive Q&A with the engaged audience, an audience member then asked for examples of the value proposition for the end users for engaging with companies part of ODPi (starting after the 42:00 mark).

Sharma addressed this question, noting her experience in pre-sales, saying “You could benefit from being on an ODPi-compliant platform… if you want to have your application portable from a Hadoop as an OS, it’s possibile through being part of ODPi.”


“In the early days of Hadoop, you really did have to grow your own in-house talent,” said Kent. “but we’re entering the mature part of the lifecycle curve where there’s lots of customers that just want to pick it up and use it. They don't really want to get into all these nuances. So the value of something like ODPi… will inevitably make a standardized path, where people can say ‘If you don't go out of these lines, you’re pretty safe.’”

Catch a full recording of our meetup, centered on how ODPi fits into the Hadoop and Big Data ecosystem, here – and don’t forget to subscribe to our YouTube channel!

Hadoop Summit San Jose 2016 Wrap-up

By Blog

We’re Making Good on our Pledge to Open the Big Data Ecosystem

As part of the industry convergence on San Jose, ODPi members and Linux Foundation staffers used Hadoop Summit to share our common commitment to grow Apache Hadoop and Big Data through a set of Specifications.


.@vVineet @ScottCGrayIBM @hornpolish & @smiti_sharma sharing “War Stories: Making Software Work w/ Hadoop


@ODPiOrg booth at Hadoop Summit – those rocket footballs were a hit!


@IBMBigData booth before the show opened – Can you find the ODPi Rocket?


@CaskData captured plenty of attention with their focus on Applications and Insights, not Infrastructure and Integration


@Altiscale ready for the rush of attendees looking for Big Data as a Service

It was terrific seeing ODPi members and sharing ideas at the conference. And the conference sessions couldn’t have been more on point. In the words of Ben Markham from ODPi member Xiilab:

I particularly loved the session about Apache Nifi and how to build a smart home, as this is related to Xiilab and also something I’d personally love to do. The sheer amount of data that needs to be processed in order to make an efficient smart home is amazing, and it speaks to why we’re all so passionate about this industry!

Before describing the significant milestone achieved at Hadoop Summit, first let me provide a short recap on ODPi’s progress to date.

ODPi published its first Runtime Specification in March to specify how HDFS, YARN, and MapReduce components should be installed and configured. The Runtime specification also provides a set of tests for validation to make it easier to create big data solutions and data-driven applications.

  • The purpose?
    Increases consistency for ISVs and End Users when building on top of, integrating with, and running Hadoop.

  • Why?
    Because consistency around things like how APIs are exposed and where .jar files are located reduces engineering effort on low-value activities like maintaining compatibility matrices, so that more effort can go into building the features that customers care about.

That’s the promise and commitment ODPi and its members made to the industry when we published the Runtime Spec.  

At Hadoop Summit, ALL FIVE ODPi members that ship Apache Hadoop distributions announced that they achieved ODPi Runtime Compliance.


Cool – so how exactly does that Open the Big Data Ecosystem?

Two of the Distros that achieved Runtime Compliance, Hortonworks and IBM Big Insights, collectively partner with several hundred of the biggest Big Data ISVs and IHVs.

Altiscale, a cloud Big Data as a service company, Infosys, which supports many government clients around the world with their Hadoop distro and custom Big Data apps on top of it, and ArenaData, who is making a name for themselves bringing Hadoop and Big Data to more Russian and Eastern European businesses, also achieved Runtime Compliance.

Thanks to ODPi, today ANY of the applications that run on Hortonworks or IBM Big Insights can, WITH SIGNIFICANTLY LESS UP FRONT AND ONGOING engineering cost, support Altiscale, ArenaData and Infosys.

Pivotal lit the way by describing on their blog how Pivotal HDB was installed on the ODPi reference implementation and on one of the ODPi Runtime Compliant distributions with no modifications to standard installation steps.

That’s called Opening the Big Data Ecosystem!

Now it’s your turn to show your support for an Open Big Data Ecosystem

Tweet why YOU think Hadoop and Big Data need standards.

Share a challenge you’ve faced, maybe an engineering effort that just took way longer than it should have, or a customer support ticket that by rights should have taken minutes but instead took hours.

Be sure to tag @odpiorg and include the hashtag #ODPi4Standards in your tweet and you’ll be entered to win one of TEN $25.00 Visa Gift cards. Read contest rules here.*

*Eligibility Criteria: 10 people, tweeting 7/14/2016 – 7/18/2016, with constructive #ODPi4Standards feedback + @ODPiOrg tag or RT will win a $25 Visa gift card.

Are you a fan of #BigData and from #Believeland? Come hear “A kid from Akron” talk about ODPi

By Blog

Ok, maybe not the “Kid from Akron” you are thinking of 😉

I’m excited to be the featured speaker at this month’s Cleveland Hadoop User Group meeting. This is a very vibrant group, with a large array of developers, business users, and students who are looking to better understand the Big Data and Hadoop landscape in the Northeast Ohio area.

When I talk about tech in the midwest, people often roll their eyes at me. But let me tell you that Big Data is huge in this area. Progressive Insurance hadtwo keynote speakers at last month’s Hadoop Summit that dug into the technology behind predictive analytics for the insurance industry. And Explorys, a spinoff startup from the Cleveland Clinic that was acquired last year by IBM to lead their Watson Healthcare initiatives, is driving innovation on delivering personalized healthcare and great insight into disease prevention and cures. It’s the combination of foundational industries and new technology that is making former rust belt mainstays evolve into hot spots for technical innovation.

My talk will be focused on giving the audience a glimpse into the challenges in the Apache Hadoop market and how ODPi is looking to be an enabler for growing a large scale, yet easier to engage with ecosystem that benefits downstream consumers of this technology. And I’m looking forward to connecting more with local members of this community to help learn how they are using Apache Hadoop and how ODPi can make those investments go farther and make their businesses more effective.

If you are in the Cleveland area or will be in town to visit what this great city has to offer, then come to the meet up on Monday, July 25, 2016 at 5:00 PM at the Progressive offices in Mayfield Village, OH (map). Worst case, you’ll have some excellent pizza :-).

ODPi Takes Hadoop Summit: San Jose

By Blog

The much anticipated Hadoop Summit is next week, June 28-30 in San Jose, and is one of the Apache Hadoop community’s premier events. It will feature community-chosen technical sessions, business tracks based on real-world implementations and so much more from Hadoop users, developers and ISVs.

As an organization committed to an open ecosystem of big data, we are thrilled to be exhibiting at Hadoop Summit!

During the show, ODPi will be hosting a meetup open to the public. Whether you are attending Hadoop Summit or live in the San Jose area, be sure to join us:

ODPi will host a “War Stories of Making Software Work with Hadoop” meetup! Hear from big data software vendors and application developers who have work with one or more Hadoop distributions about the technical challenges they have faced, and why they feel ODPi will help simplify this for future ISVs. RSVP for our meetup, held on June 27th at 6pm, here!

Be sure to stop by the ODPi booth (#107) too! We hope to see you at this year’s Hadoop Summit event and at the ODPi meetup!

Apache: BigData North America 2016 – Keynoters Perspective

By Blog

During Apache: BigData North America 2016 last month, both Alan Gates and John Mertic keynoted the conference.

Screen Shot 2016-06-01 at 12.36.32 PM.png

Alan discussed the age of data and data-defined applications, ODPi’s objectives, and what benefits ODPi brings to the larger big data ecosystem.


John spoke about building a stronger Apache Hadoop ecosystem and how the Apache Software Foundation and ODPi complement one another.

Outside of their keynotes, what was their experience at Apache: BigData? Both speakers have shared their key take-aways and insights.

Alan Gates

With my ODPi hat on, I think the best thing about Apache Big Data was the fact that so many people learned about who we as ODPi are and what we do.  Many people I talked to had never heard of us.  And when we explained our goals and what we have done so far they were excited.  And for many who had heard of us but were confused (are we competing with Apache? are we another distribution? why do we even need the ODPi?) we were able to communicate our mission and answer their questions.

From a more personal perspective, I enjoyed the technical level of the sessions.  The sessions I went to had a high level of interesting technical content, with engineers sharing the work they were doing.  The questions I received in the two sessions I presented on Hive showed that the audience was engaged with Hive, how it works, and the changes that the community is making in the project.  It is always interesting to hear what others are working on and to be able to share your work with others who appreciate and understand it.

To learn more on Alan’s perspective, read his recent Hortonworks blog ODPi Helps ASF Accelerate Big Data Ecosystem

John Mertic

This conference was a first for me… not only just the conference in general — I have been to ApacheCon before — but not the Big Data specific section. I was also able to keynote such an event. As any keynoter would say, it is a privilege and an honor first and foremost, but it also gives you a chance to pause and see the event at an entirely different level.

The first thing that stands out is that I got a chance to see Big Data from the innovation creator’s perspective. One project that stood out to me in this regard was Apache Tinkerpop. At its base level, it’s a framework for easily doing graph analysis on compatible sets of data. However, what fascinated me was the conversations around the problems you could solve with it. For example, “What if you could build a car recommendation engine with it, that took into account which car(s) you’ve owned, where you are at in life (age, family, career, location), and information on the vehicles themselves, to recommend what to purchase?” This higher-level thinking brought me back to the software development conversations I had early in my career, and shows how technologists are getting more use-case driven in their approaches.

The other thing that stood out was with all this innovation, there’s still confusion. A stat I heard was that there is a new project coming out of Apache incubation every 6 weeks. That isn’t in and of itself a problem, as I talked with attendees, but the result of that pace of innovation is. The feedback loop is hard to maintain with such an overwhelming firehose. Commercial support dies off quickly after around 15-20 key projects. And the constancy of the Apache Hadoop products that package these project into a canonical distribution tends to not be there. Attendees expressed frustration with the upstream project and distro vendors.

My keynote focused on these two themes, telling the story that innovation is key, but that it isn’t possible without some level of standardization. I made the statement that a stable base is what holds Apache Hadoop back, and based on the conversations I had this definitely is true. Attendees I talked to knew that this wasn’t a barrier to innovation, but an enabler of it. More specifically, they are able to invest more in Apache Hadoop as a unified ecosystem is built around it.

A key takeaway from the week – Apache Hadoop users see the importance of the technology, but are pushed away by the complexity of finding a great way to engage with their use cases. This is where I see ODPi being able to close that loop.

Our Week at Apache: BigData North America

By Blog

Last week, ODPi participated in Apache: BigData North America in Vancouver, B.C. as an exhibiting platinum sponsor where we announced our gold sponsorship of the Apache Software Foundation.

During the show, we connected with the Apache community by presenting two keynote presentations, holding multiple panel and tutorial sessions, and hosting Apache project office hours at our booth.

Screen Shot 2016-05-13 at 3.03.35 PM.pngScreen Shot 2016-05-17 at 2.18.47 PM.png

Here’s a look back:

ODPi Becomes Gold Sponsor of The Apache Software Foundation

On Wednesday, May 11, we announced our gold sponsor of The Apache Software Foundation (ASF). We joined existing ASF sponsors: Hortonworks, IBM, Pivotal, and WANDisco, who are also ODPi members, to advance open source projects for the big data ecosystem.

“We are pleased to welcome ODPi to the ASF Sponsorship program and their support of Apache big data projects,” said ASF Vice Chairman Greg Stein. “As analysts project that enterprises will fully embrace the Apache Hadoop ecosystem, Sponsor support is even more vital to our success in spearheading industry-leading innovations that are developed in a trusted, community-driven environment.”

This announcement generated positive feedback from the Apache community during the show, as well as many articles – Datanami, Channel Insider, CIO,, StorageReview, and IBM DeveloperWorks TV.

Some of our favorite quotes include:

“Both our organizations are laser focused – we want a stronger Hadoop ecosystem,” Mertic said, of the ASF and ODPi. “But let’s be honest, it hasn’t been easy. The Apache project really focuses on those raw components, the peanuts and the chocolate, if you will. But if you’re an end user, you’re looking for that Reese’s Peanut Butter Cup.” – Ian Murphy,

“We’re committed to making sure of Apache’s success,” Mertic says. “Our success depends on the upstream project success. Furthermore we’re not looking to do development separate from Apache. We want Apache to own the development process. We want them to own the projects process….We want Apache to keep doing what Apache’s great at, which is building amazing, incubated, governed projects.” – Alex Woodie, Datanami

“John Mertic, director of ODPi, said a group made up of ISVs in the ODPi consortium (which counts IBM, GE, Splunk, EMC, SAS Institute, Infosys and Capgemini among its members) is trying to make sure there are enough checks and balances to prevent Hadoop from fragmenting into a set of incompatible distributions. While each distribution uses the same core base, it’s become apparent that the leading providers of these distributions are creating extensions that could ultimately force ISVs to incur higher costs by having to support applications optimized for a particular instance of Hadoop.” – Mike Vizard, Channel Insider

The conversations did not stop there. Twitter was also buzzing with excitement. Here are some of our favorites:

Screen Shot 2016-05-17 at 12.04.48 PM.pngScreen Shot 2016-05-16 at 11.52.49 AM.png

Screen Shot 2016-05-17 at 12.04.53 PM.pngScreen Shot 2016-05-17 at 12.04.42 PM.png

Keynotes, Panels, and Technical Sessions


  • Roman Shaposhnik & Konstantin Boudnik from Pivotal discussed How ODPi Leveraged Apache Bigtop to Get to Market Faster (and You Can Too!). Attendees learned how contributions from ODPi members are helping Apache Bigtop get even stronger and provide an integration platform for the next-generation big data technologies.

  • Milind Bhandarkar of Ampool dove into a riveting story Standing on Shoulders of Giants: Ampool Story and also discussed how Ampool is contributing to several ASF projects in ODPi.

  • ODPi panel, ODPi: Advancing Open Data for the Enterprise, gave attendees an overview of how Hadoop distributors, ISVs, SIs, and enterprises (end users) will benefit from standardization. Panel members included (from right to left) Milind Bhandarkar of Ampool, Roman Shaposhink of Pivotal, Alan Gates of Hortonworks, and Susan Malaika of IBM. Moderated by Gregory Chase of Pivotal.


  • Alan Gates from Hortonworks provided a thought-provoking Keynote: ODPi 101: Who We Are, What We Do and Don’t Do. Gates outlined the ODPi Core, a set of software components, a detailed certification and a set of open source tests to make it easier to create big data solutions and data-driven applications.

Screen Shot 2016-05-13 at 3.02.40 PM.pngScreen Shot 2016-05-17 at 2.23.14 PM.png


  • In his keynote, John Mertic from ODPi presented ODPi and ASF: Building a stronger Hadoop ecosystem, where he detailed how the ODPi’s specifications and by-laws reinforce the role of the ASF as the singular place where Apache Hadoop development occurs. He also announced ODPi’s gold sponsorship of ASF during his keynote!

Screen Shot 2016-05-17 at 2.23.27 PM.pngScreen Shot 2016-05-13 at 3.01.41 PM.png

  • John Mertic of ODPi and Jim Jagielski of the Apache Software Foundation teamed up for a riveting conversation about ODPi and ASF Collaboration. Attendees asked questions and learned about how the ASF and ODPi complement each other, where the big data ecosystem is heading, and more.

Apache Project Office Hours

On Tuesday during Apache: Big Data, we hosted Apache project office hours meetings where any project could sign up for 30 min time slots and host their meeting in our lounge. Communities from Geode, Tinkerpop, Spark + RDBMS, Hawq, and MADlis participated.

Screen Shot 2016-05-13 at 3.03.11 PM.png  Screen Shot 2016-05-13 at 3.02.03 PM.png

ODPi Booth Activity

Screen Shot 2016-05-13 at 3.02.26 PM.pngScreen Shot 2016-05-17 at 2.23.49 PM.png

Screen Shot 2016-05-13 at 3.02.11 PM.pngScreen Shot 2016-05-13 at 3.02.54 PM.png

Overall, it was a wonderful week. We are looking forward to Hadoop Summit, June 28-30 in San Jose. Hope to see you there!

Altiscale, Capgemini, IBM, and Unifi Software Discuss the Future of Big Data

By Blog

Last week we were lucky enough to have some time to sit down with ODPi members Steve Jones, global vice president of Big Data at Capgemini, Mike Maciag, COO at Altiscale, Sean Keenan, co-founder and vice president of products at UNIFi Software, and Todd Moore, vice president Open Technology at IBM, to discuss the future of big data, cognitive computing and how ODPi can help drive innovation. Todd Moore also expanded on his thoughts from his recent blog post on ODPi.

Here is what they had to say:

1. What are the challenges customers face with Hadoop and how will ODPi solve them?

Maciag, Altiscale: ODPi makes it much easier to select a Hadoop platform, select applications that will run successfully on that platform, and switch platform providers in the future, if necessary. Hadoop is actually a large ecosystem of different software projects and each component gets updated and released on a different schedule. As a result, customers today have a tough time making a decision about which platform to pick and application vendors have a hard time making their apps work on multiple platforms. With the ODPi standard, customers have the confidence that they are picking a certified solution with an array of applications that have been made for it and will run successfully. And if you don’t like the support that you’re getting from your platform provider, you can switch with confidence, since your applications will still work. The existence of a standard will allow the Hadoop ecosystem and the applications that run on it to flourish, leading to greater business value for customers.

Jones, Capgemini: The objective of ODPi is to develop a stronger and more innovative ecosystem by providing vendors and companies a firmer and more assured base from which to work from. The challenges have been the poor interoperability  and fragmented initiatives in the market and competing  technical solutions. ODPi makes it easier to deploy big data solutions and data-driven applications for a whole range of use cases, because of the cross-compatibility and standardization of this common reference platform.

Moore, IBM: Compatibility has been a huge issue for customers, partners and application ISVs who write on-top of Hadoop. ODPi will allow developers and organizations to have confidence that a distribution will most likely be able to run the same applications with other distributions that run this same application.

ODPi will drive interoperability and compatibility by providing a common platform against which to certify applications. It also opens up choices for developers by enabling this Interoperability with different distributions within an organization.

The ODPi test framework and self-certification also aligns closely with the Apache Software Foundation by leveraging Apache BigTop for comprehensive packaging, testing, and configuration. More than half the code in the latest Big Top release originated in ODPi.

Keenan, UNIFi: ODPi is providing standards sorely needed within the Hadoop ecosystem. This will enable a more mature, solidified, and hardened stack as vendors continue to adopt this standard. It will remove the barriers of distribution and vendor lock in.

2. What do you see for ODPi in the future?

Maciag, Altiscale: ODPi provides reassurance to our customers that whatever they build on Altiscale will be something that they can use in other places and in other situations. It gives them confidence in using a vendor. It also allows them to run applications in a hybrid environment, for example, using Hortonworks on premises and Altiscale in the cloud. It also allows customers to easily transition from an on-premises environment to a fully cloud one. We are talking to many customers who want to make a transition to the cloud and they are running on a different Hadoop distro today. Since Altiscale is ODPi compliant, they are comfortable that they can easily make that transition.

Jones, Capgemini: The next question for ODPi is what other pieces need to be standardized but also how to move towards a greater degree of verification-based compliance. Certainly as our clients see the tangible proof of how ODPi can reduce complexity and costs, we can expect adoption to accelerate.

Moore, IBM: We expect ODPi to grow. There is significant innovation happening in the industry right now and we like that ODPi will be able to provide a solutions for these challenges. By working together with the other member companies, we can enable organizations to better innovate with Hadoop and other big date technologies thanks to standardization.

Keenan, UNIFi: ODPi has made great strides and is quickly maturing into a standard with a ground swell movement behind it. Our expectation and belief based on these early successes is that this program will continue to see additional vendors and platform providers get on board. Hadoop has won the battle as the Analytic platform of choice, now it is time to build the appropriate standards to accelerate innovation and solution development.

3. Can you share some ways ODPi benefits your customers?

Maciag, Altiscale: The members of ODPi strongly believe that customers and the value that they get from Hadoop should drive the industry, not any particular vendor.  When customers win, the industry and vendors win, too. The release of the runtime spec shows that standards can be established, that the vendors can work together. In the future, I look forward to the expansion of the spec and you’ll start hearing more and more about the real world benefits that this provides.

Jones, Capgemini: ODPi helps customers by enabling an ecosystem of suppliers who can work across multiple Hadoop vendors.  This enables procurement and IT departments to focus on supporting the business ambitions rather than sorting out the technical issues of interoperability and competing technologies.

Moore, IBM: Within the IBM ecosystem, big data plays a tremendous role. As we are beginning to dive deeper into the cognitive world, there has been an explosion of structured and unstructured data feeding into it. Hadoop, Big Data, machine learning, they all plays into how this cognitive world will evolve. So having a platform to count on, starting with a small, specific and consistent packaging model that lives within the ecosystem, is priceless. As people take advantage of this and deploy into their own cloud infrastructure, they will be able to test once and run everywhere. In this way, standardization is an enabler and is a huge deal as we move into the cognitive era. IBM is looking forward to distributing ODPi applications. This will be a key differentiator for IBM to serve customers.

Keenan, UNIFi: Our customers will benefit by reaping the rewards of a more mature data processing platform that they can build their future analytics initiatives on. They also benefit from a Unifi perspective as we’ll be able to spend more resources delivering features, capabilities, and solutions that solve their business problems versus spending cycles supporting all variants of services across a no standards-based processing platform.

4. What advice would you give to others looking to join ODPi?

Maciag, Altiscale: If you’re an application vendor looking to accelerate your business and find more customers, this is the place to be. ODPi certification takes friction out of the sales process and provides peace of mind in a rapidly changing Hadoop ecosystem. By joining ODPi, you get to help influence the standard and know sooner about upcoming developments, so that you can take advantage of them more quickly. The membership of companies today really represents leading thinkers and makers in Hadoop, and so it’s a great group to join.

Jones, Capgemini: Get involved, look at what you want ODPi to do next in practical terms, and join with the ambition of helping to make that happen.

Moore, IBM: You do not have to be a member to join. Goals are self-explanatory, organizations can self-certify and everything can be found on Github. If you are a company, ODPi is a great community and has made significant progress, quickly. Please join us and anticipate in this evolution. It is certainly something Hadoop has needed for years and years.

Keenan, UNIFi: We would encourage participation. This consortium is good for the community and market as a whole and will only create additional value as more members contribute to the program.

5. Why did Capgemini join ODPi?

Jones, Capgemini: Capgemini joined ODPi because we believe in open standards ability to drive innovation.  From our involvement with Open Group, to more technical and sector focused standards effort, we’ve consistently found that standards help drive innovation as well of course bringing the benefits of streamlining and cost efficiency.

Social Media Auto Publish Powered By :