ODPi and AtScale to Partner on 3rd Annual Big Data Maturity Survey

By | Blog

We are very excited to announce that ODPi will join AtScale and their other partners Cloudera, Hortonworks and MapR in the 3rd annual Big Data Maturity survey.

The Big Data Maturity Survey is designed to provide a global view of big data use by industry, workload and geography.

The survey takes less than 5 minutes and you can take it here (http://www.surveygizmo.com/s3/3648348/odpi).

Your responses will remain confidential and will be used only to construct the aggregated results.

In appreciation for your time, you’ll be entered in a drawing to win a $100 Amazon gift card.

Why ODPi is Partnering with AtScale

We have referenced the Big Data Maturity survey in prior blogs and we couldn’t be happier that AtScale asked us to help get the word out with our members to the enterprise production Apache Hadoop community we serve.

AtScale’s report confirms many of the things ODPi members tell us about the state of enterprise-wide production Apache Hadoop, and the challenges many businesses face in getting there.

73% in Production

As the callout on the left shows, the 2016 Big Data Maturity survey found that 73% of respondents run Hadoop in production.

ODPi members and community validate this figure, with one caveat: we want to understand what users actual mean when they say “production.”

A look at Figure 1 shows what we mean.

 

 

Figure 1: Production ≠ Production

Clearly Lab (AKA experimental) deployments Do Not qualify as production. But PoCs, Pilots and of course enterprise-wide deployments DO.

So, if my calculations are correct: 14% PoC  + 29% Pilot + 28% Enterprise-wide = 71% Total in Production

It’s great when two different bodies of research arrive at the same conclusion using different approaches. It means we can have even more confidence in the results!

BUT, 72% of users are in Pre-Enterprise-wide Production

As bullish as ODPi is about the production opportunities for Hadoop and Big Data, what we see in Figure 2 is that 72% of Hadoop users are piled in what we’ll call limited use scenarios.

Figure 2: Most Hadoop users not yet in Enterprise-wide

ODPi exists to provide the neutral space where the Big Data industry can come together to knock down our common barriers to enterprise-wide production Hadoop

Governance and Security Key to Unlocking Door to Enterprise-wide Use

As the callout from the 2016 Big Data Maturity report shows, Governance is the fastest growing concern.

ODPi is also focused on the need for an industry-wide approach to Hadoop governance. Partnering with leading Big Data users, platform providers, and app vendors, ODPi will establish standard APIs and compliance programs that will ensure compatibility across BI tools and various metadata stores.

Conclusion

Be sure to take 5 minutes out to complete this year’s Big Data Maturity survey. You’ll be able to use the results to benchmark your efforts against your industry peers and gain valuable insight into technology trends that impact you.

Here at ODPi, we will use the research to continue focusing on building open programs, open source projects and free resources to smooth the path towards enterprise-wide production Hadoop.

ODPi Takes Big Data Day LA

By | Blog

By Roman Shaposhnik

Earlier this month, I attended Big Data Day LA – a vibrant community gathering of data and technology enthusiasts in sunny Los Angeles. Located on the USC campus, the 5th annual event was organized by local Big Data user groups and volunteers!

Unlike many of the big data industry’s events, Big Data Day LA wasn’t a company-owned conference and registration was fully covered by data-driven sponsors, like Hortonworks, Disney Interactive, WANdisco and more – making the event free for anyone to attend. As such, it wasn’t surprising that it attracted such a big crowd, with more than 1,500 people in attendance!

During the conference, I presented “Big Data on The Rise: Views of Emerging Trends & Predictions from real life end-users” where I offered the audience an overview of key trends emerging in 2017 within the Hadoop and Big Data ecosystem. My session also covered data from the ODPi End User Advisory Board (TAB) and real end-user perspectives on how companies are using Big Data tools, challenges they face and where they are looking to focus investments. My talk was well received by those in attendance and quite a few people approached me following the session to discuss their new understanding of ODPi and how it relates to traditional vendors, the Apache Software Foundation, the Linux Foundation and the enterprise.

The remainder of the conference featured a great selection of talks, especially for data scientists and software developers and, unsurprisingly, the entertainment industry track was a huge hit – featuring talks from Netflix, Warner Brothers, Guitar Center and more.

And, of course, being a Silicon Valley guy I simply had to check out all the startup buzz at the conference as well. Not only did the conference feature an awesome Startup Showcase track, but there were also quite a few presentations pushing the envelope on state-of-the-art machine learning. Just to give you a quick taste, I suggest you go to http://novamente.ai/ and check out the truly SciFi projects these guys are tackling. Their presentation on how to apply AI to producing ever higher grossing movies (think scripts, casting, visual effects and more) had the audience on the edge of our seats.  

A few other sessions I particularly enjoyed include one from Bain & Company where they tried to put big data and machine learning in the context of organizational shifts required in any traditional enterprise, in order to realize full value from big data insight. On the flip side, even if you have your organization all lined up for digital transformation, you still have to be mindful of challenges on the technical side of machine learning. The notion of a hidden, growing technical debt in these complex, end-to-end machine learning pipelines is something that we all should keep in mind and Irina Kukuyeva’s presentation did a great job of highlighting some of the same important areas.

Overall, Big Data Day LA was a fun and dynamic event that harnessed the upstream community and showcased the importance data has on each level of business.

 

 

Making Production Hadoop Faster, Easier and More Productive: ODPi Member Conversation with Zettaset

By | Blog

In our fifth ODPi Member Conversation podcast, John Mertic spoke with Zettaset’s Director of Product Management, Sesh Ramaswami, and CMO, John Armstrong.

Their multi-perspective discourse covered off on the security challenges of deploying Hadoop in production, data governance and security expectations, along with how nonprofit organizations like OASIS and ODPi can help drive standards for the big data ecosystem.

Touching on big data security within production environments, Ramaswami dove right into enterprise expectations around data governance and data processing.

Referring to the role data governance plays in the Hadoop ecosystem, Armstrong noted that security wasn’t always a concern, as data was in isolation away from production.

However, the space has changed drastically over the years. Touching on this shift, Armstrong said, “When [enterprises] finally want to move into production, and want to take those data environments and put them into the mainstream… we need to have some real security in place before you plug into the rest of the network —  because Hadoop coexists in large enterprise environments in conjunction with relational database environments, maybe other NoSQL, so it’s not alone and it has to play well with others.”

To hear more of the group’s insight, tune in to the episode on our YouTube channel!

Subscribe to our YouTube channel and follow us on Twitter to catch upcoming episodes of the ODPi Member Conversation podcast series!

Making Production Hadoop Faster, Easier and More Productive: ODPi Member Conversation with IBM

By | Blog

In our fourth ODPi Member Conversation podcast, John Mertic met with IBM’s Worldwide Analytics Architect Leader for Data Lakes, Neil Stokes.

Their insight-rich conversation kicked off with a discussion around machine learning and cognitive vision, along with how an ODPi Compliant Hadoop environment relates to cognitive computing.

Stokes, who has spent more than 20 years working at IBM, has a unique perspective on how Hadoop and cognitive systems afford enterprises the ability to ingest and make sense of the rawest possible forms of data, and why systems are only as good as the corpus of data to which it has access.

Referring to these sets of broad, invaluable data, Stokes noted “Interoperability is key here… Groups like ODPi – who have the ability to facilitate that level of interoperability across vendors, across platforms, across different technologies – there is a tremendous value that groups like ODPi bring [to the ecosystem].”

To hear more of Stokes and John’s discussion, tune into the episode on our YouTube channel!

Subscribe to our YouTube channel and follow us on Twitter to catch upcoming episodes of the ODPi Member Conversation podcast series!

Twitter Poll Results: Is Apache Hadoop Running in Production?

By | Blog

Following the publication of our White Paper, 2017 Preview: The Year of Enterprise-wide Production Hadoop, we ran a series of Twitter polls to get a rough sense of where the market is on the following 4 questions:

  1. Does your company use Hadoop in production?
  2. What stage are most of your Hadoop deployments (lab, PoC, Pilot, Enterprise-wide production)
  3. When will you have Hadoop in Enterprise-wide produciton use?
  4. What challenges did you encounter while expanding Hadoop use?

We started with the basics, asking first:

The split between production and non-production use is in line with what we hear from our community.

As we discuss at length in the white paper, this concept of “production” Hadoop can be misleading. For instance, pilot deployments and enterprise-wide deployments are both considered “production,” but they are vastly different in terms of DataOps, as table 1 below illustrates.

Table 1: DataOps Considerations from Lab to Enterprise-wide Production

In the next poll, we learned that 72% of Hadoop deployments are stacked up in the pre-enterprise wide stages.

One of the other diagrams you’ll find in our white paper is the Enterprise Hadoop Deployment Continuum. In the version below, I have added the percentages from the Twitter poll in each stage.

Figure 1: Most Hadoop deployments are in pre and limited production.

With this established, we then asked the Twitterverse when they expect to be enterprise-wide with Hadoop? Reassuringly, the same 28% that told us they were enterprise-wide in poll #2 reiterated this in poll #3.

Less reassuring, however, is that only 9% of those that are presently pre-enterprise wide have concrete plans to get into enterprise-wide in the next 12 months, and even fewer have such plans in the next 24 months.

An eyebrow-raising 55% said that they’re not sure when they will reach enterprise-wide deployment.

And when asked about the challenges big data pros faced increasing their use of Hadoop, responses were very evenly distributed across the four big areas we hear from the ODPi community.

ODPi is here to remove risk and uncertainty from Hadoop and Big Data. We do this through comprehensive testing suites that improve predictability and through compliance programs to ensure interoperability. In other words, ODPi is here to smooth and illuminate the path to enterprise-wide production use of Hadoop for the 55% of respondents that don’t know when (if?) they will get there.

And the ODPi Special Interest Groups, or SIGs, were set up to address the widespread challenges that poll #4 surfaced.

Like all the technical work at ODPi, SIGs are wide open for all to participate in.

Join us and help drive toward solutions in these areas.

Stay Informed

Sign up for our Newsletter to receive the latest ODPi news and updates.