All Posts By

Greg Wallace

ODPi and AtScale to Partner on 3rd Annual Big Data Maturity Survey

By | Blog

We are very excited to announce that ODPi will join AtScale and their other partners Cloudera, Hortonworks and MapR in the 3rd annual Big Data Maturity survey.

The Big Data Maturity Survey is designed to provide a global view of big data use by industry, workload and geography.

The survey takes less than 5 minutes and you can take it here (http://www.surveygizmo.com/s3/3648348/odpi).

Your responses will remain confidential and will be used only to construct the aggregated results.

In appreciation for your time, you’ll be entered in a drawing to win a $100 Amazon gift card.

Why ODPi is Partnering with AtScale

We have referenced the Big Data Maturity survey in prior blogs and we couldn’t be happier that AtScale asked us to help get the word out with our members to the enterprise production Apache Hadoop community we serve.

AtScale’s report confirms many of the things ODPi members tell us about the state of enterprise-wide production Apache Hadoop, and the challenges many businesses face in getting there.

73% in Production

As the callout on the left shows, the 2016 Big Data Maturity survey found that 73% of respondents run Hadoop in production.

ODPi members and community validate this figure, with one caveat: we want to understand what users actual mean when they say “production.”

A look at Figure 1 shows what we mean.

 

 

Figure 1: Production ≠ Production

Clearly Lab (AKA experimental) deployments Do Not qualify as production. But PoCs, Pilots and of course enterprise-wide deployments DO.

So, if my calculations are correct: 14% PoC  + 29% Pilot + 28% Enterprise-wide = 71% Total in Production

It’s great when two different bodies of research arrive at the same conclusion using different approaches. It means we can have even more confidence in the results!

BUT, 72% of users are in Pre-Enterprise-wide Production

As bullish as ODPi is about the production opportunities for Hadoop and Big Data, what we see in Figure 2 is that 72% of Hadoop users are piled in what we’ll call limited use scenarios.

Figure 2: Most Hadoop users not yet in Enterprise-wide

ODPi exists to provide the neutral space where the Big Data industry can come together to knock down our common barriers to enterprise-wide production Hadoop

Governance and Security Key to Unlocking Door to Enterprise-wide Use

As the callout from the 2016 Big Data Maturity report shows, Governance is the fastest growing concern.

ODPi is also focused on the need for an industry-wide approach to Hadoop governance. Partnering with leading Big Data users, platform providers, and app vendors, ODPi will establish standard APIs and compliance programs that will ensure compatibility across BI tools and various metadata stores.

Conclusion

Be sure to take 5 minutes out to complete this year’s Big Data Maturity survey. You’ll be able to use the results to benchmark your efforts against your industry peers and gain valuable insight into technology trends that impact you.

Here at ODPi, we will use the research to continue focusing on building open programs, open source projects and free resources to smooth the path towards enterprise-wide production Hadoop.

Twitter Poll Results: Is Apache Hadoop Running in Production?

By | Blog

Following the publication of our White Paper, 2017 Preview: The Year of Enterprise-wide Production Hadoop, we ran a series of Twitter polls to get a rough sense of where the market is on the following 4 questions:

  1. Does your company use Hadoop in production?
  2. What stage are most of your Hadoop deployments (lab, PoC, Pilot, Enterprise-wide production)
  3. When will you have Hadoop in Enterprise-wide produciton use?
  4. What challenges did you encounter while expanding Hadoop use?

We started with the basics, asking first:

The split between production and non-production use is in line with what we hear from our community.

As we discuss at length in the white paper, this concept of “production” Hadoop can be misleading. For instance, pilot deployments and enterprise-wide deployments are both considered “production,” but they are vastly different in terms of DataOps, as table 1 below illustrates.

Table 1: DataOps Considerations from Lab to Enterprise-wide Production

In the next poll, we learned that 72% of Hadoop deployments are stacked up in the pre-enterprise wide stages.

One of the other diagrams you’ll find in our white paper is the Enterprise Hadoop Deployment Continuum. In the version below, I have added the percentages from the Twitter poll in each stage.

Figure 1: Most Hadoop deployments are in pre and limited production.

With this established, we then asked the Twitterverse when they expect to be enterprise-wide with Hadoop? Reassuringly, the same 28% that told us they were enterprise-wide in poll #2 reiterated this in poll #3.

Less reassuring, however, is that only 9% of those that are presently pre-enterprise wide have concrete plans to get into enterprise-wide in the next 12 months, and even fewer have such plans in the next 24 months.

An eyebrow-raising 55% said that they’re not sure when they will reach enterprise-wide deployment.

And when asked about the challenges big data pros faced increasing their use of Hadoop, responses were very evenly distributed across the four big areas we hear from the ODPi community.

ODPi is here to remove risk and uncertainty from Hadoop and Big Data. We do this through comprehensive testing suites that improve predictability and through compliance programs to ensure interoperability. In other words, ODPi is here to smooth and illuminate the path to enterprise-wide production use of Hadoop for the 55% of respondents that don’t know when (if?) they will get there.

And the ODPi Special Interest Groups, or SIGs, were set up to address the widespread challenges that poll #4 surfaced.

Like all the technical work at ODPi, SIGs are wide open for all to participate in.

Join us and help drive toward solutions in these areas.

Predictable Hybrid Hadoop Blog Series – Crossing the Chasm

By | Blog

In the previous blog in this series, we outlined some of the important ways that running Hadoop in production – especially in enterprise-wide production – differs from point solutions and PoCs.

As a leading downstream community of big data vendors, users and platform providers, ODPi is focused on tackling the security, governance, lifecycle management and application portability needed to run Hadoop at scale.

A classic way to think about technology maturity is Geoffrey Moore’s Chasm model. In our new white paper, we plot key Hadoop milestones against the technology adoption curve (see image below) and argue that the things the ODPi community are focused on are essential to continuing the adoption of this transformative technology.

An adaptation of Everett Roger’s famous S diffusion of innovations curve, the Chasm model argues that users on the left of the chasm are fundamentally different from those on the right. The chasm separates users by adoption trigger/motivation: on the left it’s all about competitive advantage at nearly any cost, on the right it’s about continuity of operations and keeping up with the Joneses.

As awesome as this model is, it has sometimes been co-opted. One way this happens is by applying it to a Product, when in fact it needs to apply to a Category. This is one reason why we are so bullish about our work at ODPi – we explicitly acknowledge that the only way Hadoop and associated Big Data solutions can cross the chasm to mainstream adoption is by working together to define category-wide – NOT vendor-specific – answers to questions like lifecycle management, security and governance, application portability – these are the things that address early and late majority users’ interest in stability and operational continuity.

When thinking about what it really means for a technology to be a platform, we like the way Sam Ghod puts it:

A platform abstracts away a messy problem so you can build on top of it. Platforms do this by delivering portability and extensibility.

With ODPi Releases 1.0 and 2.0 in place, we invited Application Vendors to self-certify that their applications work unmodified across multiple ODPi Runtime Compliance Hadoop Distros. As of this writing, twelve applications from leading vendors like SAS, IBM and DataTorrent have completed the self-certification.

We believe that savvy Enterprise CDOs, CIOs, CTOs and Chief Information Security Officers (CISOs) should carefully consider the platform independence that ODPi’s Interoperable Apps program delivers before making their Hadoop platform choices. If one of your preferred vendors isn’t listed either as an Interoperable App or as a Runtime Compliant Platform, let that vendor know that it matters to you.

In 2017, we’re heads down adding to our existing specifications and creating new workstreams through our Special Interest Groups. We invite you to get involved. If you are a twitter user, be sure to follow @odpiorg and participate in our ongoing polls.

Predictable Hybrid Hadoop Blog Series – DataOps Considerations From Lab to Enterprise-wide Production

By | Blog

In last week’s blog, The Hadoop Deployment Continuum, we covered how “in production” actually refers to a very diverse set of deployment scenarios. Anything from a PoC, to point solution, departmental deployment to enterprise-wide production can and often is called “production” use.

This blog focuses on the step-change DataOps requirements that come when you take Hadoop into enterprise-wide production.

As enterprises plan to move Hadoop and Big Data into enterprise-wide production scale out, they face a number of challenges.

Table 1, taken from our recent White Paper, details how running Hadoop and Big Data at enterprise-wide production requires a significant re-think across multiple dimensions.

The good news is that these are the very same challenges that ODPi big data community has been working on for over a year. Through our ODPi Compliance and Interoperable Apps programs, enterprises get stacks that are validated across a number of platforms, providing needed support for  multi-vendor procurement policies. In the words of Gene Banman, CEO of ODPi member DriveScale: “Enterprises have varying big data needs that require flexible and interoperable platform components. Becoming a member of ODPi will allow us to better maximize data center efficiency for Hadoop with interoperability for enterprise-grade deployments.”

Our ongoing work to validate workloads across cloud environments promises to extend ODPi predictability even further.

From a lifecycle management perspective, our Application Installation and Management specification covers requirements and guarantees for custom service specifications and views. Importantly, this spec, like all ODPi specs, is developed in the open and guided by the ODPi Technical Steering Committee (TSC), which is pulled from the entire Big Data industry. ODPi benefits from the involvement of end users, Hadoop platform providers, solution providers, and ISVs.

Last but certainly not least, our Special Interest Groups (SIGs), are looking into the following areas that are key to predictable enterprise-wide operations:

If these things matter to you, we invite you to get involved with any of these SIGs and/or join our slack channel and work with us to co-create a predictable hybrid future for Hadoop.

Predictable Hybrid Hadoop Blog Series – The Hadoop Deployment Continuum

By | Blog

In working on the recent ODPi White Paper, a few things have come into much sharper focus to the team here.

First is that “Production” is a loaded term. Even though you’ve got really good research from places like AtScale reporting that 73% of respondents run Hadoop in production, we think this term needs to be unpacked.

That’s why we worked across our community, including ODPi members and participants in our User Advisory Board, on this Enterprise Hadoop Deployment Continuum graphic.

The very simple idea here is to plot Hadoop deployments from the lab all the way to enterprise-wide production use and to lay against the gates between phases the primary considerations Big Data teams review before taking the next step.

Many of the folks we talk to in our UAB, our membership and at conferences agree that right now, their Hadoop deployments are straddling the last gate, between Point Solution (sometimes these are massive with big business impact and huge volumes of data, but still focused on a single department/application) and looking to go Enterprise-wide. Some folks we’ve talked to even said they could put specific dates on this image when Hadoop has passed through these different phases. Can you?

It’s a very exciting juncture in the history of this amazing technology. Here at ODPi, we are squarely focused on collaborating as an industry to ensure the needed governance, security models and portability are in place to bring about predictable hybrid Hadoop.

In addition to our Runtime and Operations specifications and our ODPi Interoperable Applications program, we are also ushering in greater predictability through the work of our Special Interest Groups (SIGs), any of which we invite you to participate in:

  1. Data security and governance
  2. BI and Data Science
  3. Spark and Fast Data Analytics

These groups bring together downstream consumers of Hadoop and Big Data technologies ( Hadoop Platform Vendors, ISVs/IHVs, Solution Providers, and End-users ) to discuss and provide recommendations to our technical community on the key challenges and opportunities in each area. Participation doesn’t require code contribution – just the contribution of your insights and expertise on how to bring about predictable hybrid Hadoop for the larger Big Data world.

Inside Big Data said it well: “Enterprises that apply Big Data analytics across their entire organizations, versus those that simply implement point solutions to solve one specific challenge, will benefit greatly by uncovering business or market anomalies or other risks that they never knew existed.”  We couldn’t agree more.

The next blog in this series will contrast the operational consideration when running Hadoop in the lab/limited production versus running it enterprise-wide.