All Posts By

Kristen Evans

Certification Magazine: New free MOOC offers nuts-and-bolt intro to Apache Hadoop

By | News

In the not-too-distant past, all-things-to-all-surfers internet portal Yahoo! ran a marketing campaign that asked internet-savvy (and non-savvy) individuals one simple question: Do you, uh, Yahoo!? As Big Data continues to attract attention from businesses and organizations, a similar-sounding question is probably being asked of many IT professionals: Do you, uh, Hadoop? READ MORE.

The Linux Foundation, ODPi and edX Announce New, Free Intro to Apache Hadoop Course

By | Announcements

Massive Open Online Course (MOOC) to provide students with basic knowledge of the leading Big Data processing software

 SAN FRANCISCO – March 30, 2017 – The Linux Foundation, the nonprofit advancing professional open source management for mass collaboration, today announced its newest massive open online course (MOOC) is available for registration. The course, LFS103x – Introduction to Apache Hadoop, is offered through edX, the nonprofit online learning platform launched in 2012 by Harvard University and Massachusetts Institute of Technology (MIT). This free course will begin in early June.

This is the fifth edX MOOC offered by The Linux Foundation. Its first course, Intro to Linux, has reached more than 800,000 students globally and continues to grow in registrations. The others are Intro to Cloud Infrastructure Technologies, Introduction to OpenStack and Introduction to DevOps: Transforming and Improving Operations.

Apache Hadoop is an open source project used for distributed processing of large sets of data. It is used by organizations large and small around the world to manage and analyze the massive amounts of data being created every single second of every day. A large number of additional open source software projects exist that can be installed and run on top of or alongside it to provide additional functionality. According to 451 Research, Hadoop will advance at a 38 percent compound annual growth rate (CAGR) through 2020 and reach $4.4 billion in revenue by 2020.

At the same time, the demand for individuals who have experience managing this platform is also accelerating. According to the IT Skills and Certifications Pay Index research from Foote Partners, “the need for big data skills also continues to lead to pay increases – about 8 percent over the last year,” making this an ideal time for individuals to start a career managing Big Data with Apache Hadoop.

“As innovation across the Hadoop landscape continues to skyrocket, we’re thrilled to provide accessible, vendor-neutral education for the Big Data community,” said ODPi’s Director, John Mertic. “ODPi is committed to reducing ecosystem complexity and, with Roman Shaposhnik leading this ‘Introduction to Apache Hadoop’ edX course, we look forward to sharing insights that make Hadoop manageable for organizations of all sizes.”

LFS103x is taught by Hadoop experts from The Linux Foundation’s ODPi project, which is committed to simplification and standardization of the big data ecosystem with common reference specifications and test suites. Shaposhnik, VP of Technology for ODPi at The Linux Foundation and the course instructor, is also a committer on Apache Hadoop, co-creator of Apache Bigtop, and contributor to various other Hadoop ecosystem projects. He is also an Apache Software Foundation member and a former Chair of Apache Incubator.

Students in the course will learn:

  • The origins of Apache Hadoop and its big data ecosystem
  • Deploying Hadoop in a clustered environment of a modern day enterprise IT
  • Building data lake management architectures around Apache Hadoop
  • Leveraging the YARN framework to effectively enable heterogeneous analytical workloads on Hadoop clusters
  • Leveraging Apache Hive for an SQL-centric view into the enterprise data lake
  • An introduction to managing key Hadoop components (HDFS, YARN and Hive) from the command line
  • Securing and scaling your data lakes in multi-tenant enterprise environments

“In today’s high-tech world, more data is created every day and increasingly organizations need professionals qualified to analyze it,” said edX CEO and MIT Professor Anant Agarwal. “We are pleased to again partner with The Linux Foundation to increase access to in-demand education, helping to bring Apache Hadoop expertise into the Big Data industry.”

The course includes six chapters, each with a short graded quiz at the end. A final exam is also required in order to complete the course. Students may take the complete course at no cost, or add a verified certificate of completion for $99.

For more information on The Linux Foundation’s training and certification programs, please visit:

 About The Linux Foundation

The Linux Foundation is the organization of choice for the world’s top developers and companies to build ecosystems that accelerate open technology development and commercial adoption. Together with the worldwide open source community, it is solving the hardest technology problems by creating the largest shared technology investment in history. Founded in 2000, The Linux Foundation today provides tools, training and events to scale any open source project, which together deliver an economic impact not achievable by any one company. More information can be found at

About ODPi

ODPi is a nonprofit organization committed to simplification and standardization of the big data ecosystem with a common reference specification. As a shared industry effort, ODPi members represent big data technology, solution provider and end user organizations focused on promoting and advancing the state of Apache Hadoop®and big data technologies for the enterprise. For more information about ODPi, please visit:


Media Contact:

Natasha Woods


(415) 312-5289

ODPi Grows Its Membership With International Set of Data-Driven Companies

By | Announcements

China Mobile, High Octane, Innovyt and LizardFS join mission to solidify ODPi as the essential Open Source companion for enterprise-wide production Hadoop

SAN JOSE – Strata + Hadoop World, March 14, 2017 – ODPi, a nonprofit organization accelerating the open ecosystem of big data solutions, today announced China Mobile, High Octane, Innovyt and LizardFS have joined as members to help companies unlock value from disparate data and continue advancing the standardization of Apache™ Hadoop® and related big data solutions.

According to Brian Hopkins, VP & principal analyst at Forrester Research, a revenue model built by the firm – which included speaking to hundreds of firms and pouring over three years of survey data with nearly 10,000 responses – conservatively forecasted that insights-driven businesses would earn about $400 billion in 2016; however, by 2020 they will be making over $1.2 trillion a year due to an astonishing compound annual growth rate between 27 percent and 40 percent.

“ODPi is committed to serving the big data ecosystem by facilitating standardization and ensuring Hadoop reaches its full potential as an enterprise-wide production big data platform,” said John Mertic, Director, ODPi. “We’re thrilled to have a breadth of companies from China, Belgium, Poland and the U.S. join our efforts in validating big data as a sustainable, long-term area of investment for organizations worldwide.”

These new members bring the ODPi network of collaborative big data startups, enterprise service providers and software-driven end users to more than 35 companies – signifying the ecosystem’s growing desire to make Hadoop accessible and ready for every organization around the world.

About the newest members:

China Mobile is the biggest telecom operator in the world. The open source offerings extended by China Mobile Software Technology, including Hadoop product and professional services, serve more than 826 million customers.

“Becoming a member of ODPi will not only make customers’ choices among various Hadoop distributions far easier, but it will also help to build their confidence in China Mobile’s Hadoop product,” said Shaoling Sun, Executive Vice President of China Mobile Software Technology. “As industry compliance is the top priority of our big data products, we hope to soon have a unified, standard Hadoop version of the interface built on the ODPi reference specification.”

High Octane is a Belgium-based consulting firm, focused on companies’ enterprise architecture to help strategize, materialize and execute on their big data visions.

“High Octane is proud to join ODPi, as the organization is close to the upstream projects and focuses on a long-term operational view,” said Philippe Back, founder of High Octane. “As long-time believers in open source software, ODPi is a great vendor-neutral venue for us to share our Hadoop insights and learn from the ecosystem feedback provided by end users and SIGs from the field.”

Innovyt is a big data consulting company specializing in advanced analytics, cloud and data science focused on ensuring its customers can implement modern data-driven applications using big data technologies. Its team is committed to building a set of solution-centric frameworks that will expedite the implementation of leading platforms, like Hadoop.

“ODPi’s initiative to create interoperability and compliance for production Hadoop are incredibly meaningful – as these will provide our customers with standards, best practices and a common language,” said Vineet Kumar, founder of Innovyt. “We look forward to learning and contributing to open standards through our partnership with ODPi.”

LizardFS – an open-source Distributed File System licensed under GPLv3 – was developed and distributed by Skytechnology in Warszawa, Poland. The scalable, fault-tolerant and highly-available file system ensures security by storing all data in many replicas spread across all available nodes and can be used to build affordable storage cluster.

“Our membership to ODPi will enable LizardFS to take part in creating an industry standard for big data solutions and resolving challenges the industry faces, especially as it relates to the storage of data,” said Simon Haly, CSO of LizardFS. “We’re eager to join the initiative’s efforts and look forward to the direction ODPi gives companies just starting their journey in this thriving ecosystem.”

Additional Resources

About ODPi

ODPi is a nonprofit organization committed to simplification and standardization of the big data ecosystem with a common reference specification. As a shared industry effort, ODPi members represent big data technology, solution provider and end user organizations focused on promoting and advancing the state of Apache Hadoop® and big data technologies for the enterprise. For more information about ODPi, please visit:


Media Contact:

Natasha Woods


Improving Production Hadoop: ODPi Member Conversation with Ampool

By | Blog

Last month, John Mertic sat down for our first ODPi Member Conversation podcast with Milind Bhandarkar, founder and CEO of Ampool.

The exciting discussion centered around the challenges production Hadoop deployments face and how to make the framework faster, easier and more productive.

As he’s spent the last 11+ years working with the various versions of Hadoop – first starting at Yahoo!, where Hadoop was invented – Milind had some interesting context to share with podcast listeners.

After highlighting the changes the space has seen since Hadoop was first introduced to the world, he explained that today’s projects usually “depend on different projects or on different components in the Hadoop ecosystem.”

The importance of interoperability within these offerings, to ensure today’s software-defined companies are able to harness the full power of their data, cannot be understated – as John and Milind agreed that one of Hadoop’s biggest challenges in production has been ensuring that commercial distributions are compatible across multiple components and the applications that have been written to use these components.

To hear more of Milind and John’s expert insight, including more ways to improve production Hadoop, tune in to the episode on our YouTube channel!

Subscribe to our YouTube channel and follow us on Twitter to catch upcoming episodes of the ODPi Member Conversation podcast series!

2017 Predictions: What’s Next for Hadoop

By | Blog


By: John Mertic, Director of Program Management for ODPi

If you follow ODPi insight closely, you might remember these 2017 Big Data Predictions from our VP of Technology, Roman Shaposhnik. After the start of the new year, I started to think about what his predictions and emerging trends like Big Data’s “Push to the Cloud” might mean for our ecosystem – especially as it relates to the Hadoop landscape.

Last year, Apache Hadoop celebrated its tenth birthday. It was a milestone for the diaspora of the early team at Yahoo! that invented the technology and the worldwide community, along with The Apache Software Foundation that shepherded the growing platform since its launch. However, this decade-iversary also showcased something less obvious than Hadoop’s staying power: it brought to light that the canonical state of Hadoop is breaking apart.

Over the last couple of weeks, I’ve spent a lot of time reading through Hadoop and Big Data landscape articles written in the past few years. The most popular conversation was clearly the expansion of the stack – meaning new projects for every possible nook and cranny of the space. Fast data? Check. 12 ways to perform a SQL slice and dice? Done. AI (artificial intelligence) and ML (machine learning) capabilities? Yup. To see what I mean, take a look at this enormous Hadoop Ecosystem Table – summarizing current Hadoop related projects – here.

Traditionally, the role of Hadoop distribution providers within the ecosystem was to help make sense of a fast-changing and often-confusing landscape for customers. Showcasing their own preferred tools, distros gave the enterprise a stack of components that (more-or-less) worked well together – provided users stayed within confining application architecture walls. While this wasn’t ideal, it worked fairly well if enterprises were happy to stay in the “safe zone” their selected vendors laid out and could blissfully ignore other distros and solutions.

Though this may seem simple, the nature of deploying Big Data is quite varied. Reading through AtScale’s recent “Big Data Maturity” report, 53% of respondents reported using cloud in their deployment but only 14% have all of their data in the cloud. Not to mention Tony Baer’s recent ZDNet article citing that Hadoop in the cloud is a varied product depending upon the provider – and not in the traditional sense with how Cloudera CDP differs from Hortonworks HDP. This emergence of cloud brings into focus a fundamental shift emerging within the entire Big Data landscape.

If there is one overarching lesson the drive to PaaS and IaaS taught us, it would be the benefits of being lean. For example, you can throw more CPU, RAM and disk drives onto your on-premise environment with negligible cost increases; but for cloud instances, each addition counts against you quickly. Knowing this, the best cloud architectures include the ability to compartmentalize, identify focus areas of work and optimize each resource used – as wasting resources on the cloud has in-your-face cost ramifications.

Now combine Hadoop’s push to the cloud with the forced fiduciary responsibility of using cloud resources, and it’s quickly apparent that a traditional one-size-fits-all Hadoop distro is at natural odds – especially when that distro comes with a number of projects and tools that you’ve long-since outgrown.

My biggest prediction for 2017 is that the Hadoop of 2016 is going to become much more modular, special purpose and leaner than what is currently being shipped. We’re are already seeing these trends in the following ways:

  • IBM’s Watson Data Platform is centered around Spark – notice anything missing?
  • Cloud vendors are moving away from traditional HDFS and, instead, making their native block stores the data lake
  • Even traditional Hadoop distro vendors are recognizing this trend and launching offerings leveraging containers as a stopgap solution

This slow elimination of the one-size-fits-all ideal leads me to my second prediction: Hadoop and Big Data will no longer be discussed as their own beings – they’ll instead just be referred to as “Data.” I see this acknowledgment as the separation line between vendors who will be successful in 2017 and those who will not. Connecting the entire landscape story together, and speaking to customers about their data strategy vs. shiny new Hadoop or Big Data products, will separate this year’s data winners from its data losers.

My third prediction for Hadoop: ridding the marketplace of the “traditional Hadoop” baggage, and having the important conversations around data strategy, will employ the needs of traditional business to highlight leading technologies in this space. While this may sound pretty obvious, try answering this: how many traditional businesses are bragging about the efficiency of their Hadoop/Big Data/Data solutions and strategies right now? Not many. However, these businesses know that in order to remain competitive they’ll need to become “data driven.” I think we’ll start seeing organizations drive their needs back to vendors like never before and their successes will be much more prominently showcased. In other words, less focus on Amazon, Netflix and Facebook, and more narratives around companies like Progressive Insurance.

It’s a key year for Big Data as it crosses its biggest chasm yet, but as greater focus comes to this industry I think we’ll start seeing a noticeable push forward – setting up some even more impressive leaps in 2018 and beyond.

insideBIGDATA: “Push to the Cloud: Five Things to Nail Down Before Making the Switch”

By | News

In this special guest feature, John Mertic, Director of Program Management for ODPi and Open Mainframe Project at The Linux Foundation, makes the argument that “When it comes to your data in the cloud, there are certain pieces to the technology puzzle you should have nailed down with five baseline things to address – regardless of your IT, data output, cloud provider and security – before making the switch…” READ MORE.

VMblog: “ODPi 2017 Predictions: Rapidly-Growing and Ever-Evolving Big Data Landscape”

By | News

ODPi is a nonprofit organization committed to simplification & standardization of the big data ecosystem. As the VP of Technology for the Linux Foundation project and PMC member of Apache Bigtop, Groovy, Geode, Ignite and Incubator; Committer to Apache Hadoop and Giraph; and Mentor for Apache HAWQ (incubating), Apache MADlib (incubating) and Apache Fineract (incubating)  at Apache Software Foundation, I’m well-versed in Open Source trend identification and executing on strategies that complement these developments for the benefit of our community. I’ve put my trend observations for the big data landscape into the below predictions for the coming year. READ MORE.

ODPi Publishes First Operations Specification To Provide Developers Consistency Across Application Management Tools

By | Announcements

Leveraging Apache Ambari, Apache Hive and Hadoop Compatible File System support, ODPi 2.0 Release Standardizes Deployment Model for Enterprise Big Data Solutions   

Seville, Spain – Apache: Big Data Europe, November 14, 2016 — ODPi, a nonprofit organization accelerating the open ecosystem of big data solutions, today announced the availability of ODPi 2.0, which includes the first release of the ODPi Operations Specification and the Runtime Specification 2.0, to standardize the development model for big data solution and application providers and help enterprises improve installation and management of Hadoop-based applications.

With more than 30 members, including recently announced DriveScale, Redoop, and Xavient Information Systems, ODPi is focused on simplification and standardization within the big data ecosystem and further advancing the work of the Apache Software Foundation. Designed to make it easier to create big data solutions and data-driven applications, ODPi adds Apache Hive and Hadoop Compatible File System support (HCFS) as part of the ODPi Runtime Specification 2.0. Additionally, the ODPi 2.0 release includes Operations Specification 1.0, which provides standard guidelines for application management tools serving as reference platforms; including Apache Ambari.

“With the release of the first Operations Specification, ODPi is moving standardization forward for Apache projects in a pragmatic, fluid way that embraces developer input,” said John Mertic, Director of ODPi. “ODPi specifications are based input from developers and enterprises and how they are actually big data technologies in production environment and address real issues they’ve encountered. Our technical team developed this latest release knowing that the SQL layer, backend storage, and how applications should be installed, managed and configured in a Apache Hadoop cluster are important to them. We’ll continue to iterate on previous releases and seek industry input to ensure that we are tackling the critical issues that benefit the wider big data ecosystem.”

Key ODPi Operations Specification 1.0 Technical Features

The ODPi Operations Specification 1.0 provides standard guidelines for application management tools, with Apache Ambari as a reference platform, the Apache Software Foundation project for provisioning, managing, and monitoring Apache Hadoop clusters. By providing common expectations in guidelines, developers are able to create data-driven applications for all management tools used by platform providers. For big data solution and application providers, this minimizes the complexity, cost and training needed to build big data applications.

ODPi community worked closely with the Apache Ambari community to develop the Operations Specification, ensuring backward compatibility with the standardization and alignment with the community’s needs. ODPi community further designed this spec such that other management tools could attain compliance.

Similar to Spark, Ambari is a rapidly changing project. In working on the latest release, ODPi’s technical team collected substantial Ambari institutional knowledge, which they’ve contributed to Ambari. The reference manual will help developers more easily write an application for Ambari to manage their applications.

Key ODPi Runtime Specification 2.0 Technical Features

ODPi Runtime Specification 2.0 adds Apache Hive and Hadoop Compatible File System support (HCFS) components to Yarn, MapReduce and HDFS from ODPi Runtime Specification 1.0. HCFS support will enable storage and cloud vendors to leverage ODPi standards, empowering them to use their native storage solutions as part of an ODPi Runtime Compliant Hadoop Platform and reduce the incompatibilities that end users face. By including Apache Hive, ODPi will reduce SQL query inconsistencies across Hadoop Platforms. ODPi based its work on Hive version 1.2 and has included core functionality that will continue to behave in a standard way for future versions of Apache Hive. For more on this addition, read ODPi technical steering committee chair Alan Gates’ blog.

ODPi Compatibility and Interoperability

Several Apache Hadoop platform and big data solution and application providers; including Ampool, Hortonworks, IBM, Pivotal, and SAS have committed to testing against ODPi 2.0 to become ODPi Compliant and ODPi Interoperable. They have the ability to test against both the Operations Specification 1.0 and Runtime Specification 2.0 separately; offering greater simplicity for big data solution and application providers. This option provides end-users greater choice and flexibility by fostering an open big data ecosystem that transcends traditional vendor alliances.

For more on how ODPi is helping enterprises boost the value they get from Hadoop and Big Data, read Rouda’s recent whitepaper and listen to the accompanying webinar.

Comments from Members


“Since the founding of ODPi, Ampool has been committed to helping to drive standardization both in the organization and in how our data services interoperate with multiple Hadoop platforms. With the release of the first operations specification, we’re looking forward to submitting Ampool’s Active Data Store for compliance in the coming months.” – Milind Bhandarkar, Ph.D., Founder and CEO, Ampool


“There is a major shift occurring on how data is treated within their organization. Fundamentally, it is no longer about the persistent stores, data in Hadoop, data in operational database and real-time streaming. It is about how that data is accessed in trust and used within an organization. By working with ODPi and committing to provide these organizations with a compliant platform they can count on and interoperable software that sits on top of Hadoop, including IBM Big SQL, IBM SPSS Analytic Server, IBM Big Replicate, and others, we are helping our customers build their businesses.” – Ritika Gunnar, Vice President of Offerings, IBM Analytics


“Complying with the latest version of the ODPi specification simplifies how Apache HAWQ can query the vast quantities of data in the popular Apache Hive format, and allows us to seamlessly integrate configuration and administration through Apache Ambari. ODPi is allowing us to roll out compatibility features with the Apache Hadoop ecosystem at a much faster pace.” – Jacque Istok, Head of Data Engineering, Pivotal Software


“As an ODPi member, we are reinforcing our commitment to ensuring that SAS applications work with and exploit the Hadoop distribution of our customers’ choice – while being able to bank on the stability and quality expected in demanding business environments. The availability of ODPi 2.0 allows us to more efficiently support our customers, while also enhancing the installation and management of the SAS application within their Hadoop environment.” – Craig Rubendall, Vice President of Platform R&D, SAS

Additional Resources

About ODPi

ODPi is a nonprofit organization committed to simplification and standardization of the big data ecosystem with a common reference specification. As a shared industry effort, ODPi members represent big data technology, solution provider and end user organizations focused on promoting and advancing the state of Apache Hadoop® and big data technologies for the enterprise. For more information about ODPi, please visit:


Media Contact:

Natasha Woods


(415) 312-5289

DriveScale, Redoop, and Xavient Information Systems Join ODPi To Create Interoperable Big Data Ecosystem

By | Announcements

New members rally around compliance efforts to help organizations optimize and streamline Apache Hadoop implementations

Seville, Spain – Apache: Big Data Europe, November 14, 2016 – ODPi, a nonprofit organization accelerating the open ecosystem of big data solutions, today announced DriveScale, Redoop, and Xavient Information Systems have joined the organization to advance the simplification and standardization of Apache™ Hadoop® and related technologies.

With the incredible growth of Hadoop – forecasted to surpass $16 billion by 2020 – and related Apache project evolution, it has become difficult for developers to keep up with the pace of innovation. To make adoption of the platform more appealing to organizations looking to invest in big data technologies, ODPi members are committed to simplifying development and compatibility testing for applications.

“ODPi’s work to ensure interoperability of applications across a wide range of commercial Hadoop platforms is gaining momentum thanks to ongoing membership growth,” said John Mertic, director of program management, ODPi. “Hadoop has become a crucial part of any enterprise’s big data strategy. We are helping to mitigate complexity in the Hadoop ecosystem by facilitating standardization across big data technologies – ultimately spearheading newer and greater innovations.”

ODPi membership has grown to include more than 30 companies – encompassing industry-leading Apache Hadoop software companies, big data startups, enterprise service providers and end users.

About the newest members:

DriveScale provides a smarter way to build infrastructure for scale-out systems like Hadoop. Its composable data center architecture is provided via a set of on-premises and SaaS tools that coordinate between multiple levels of infrastructure. With DriveScale, companies can more easily support Hadoop deployments of any size, as well as other modern application workloads.

“Enterprises have varying big data needs that require flexible and interoperable platform components,” said Gene Banman, CEO of DriveScale. “Becoming a member of ODPi will allow us to better maximize data center efficiency for Hadoop with interoperability for enterprise-grade deployments.”

Redoop is a big data platform founded in China, devoted to ensuring enterprises reap the benefits of big data technology. Revolving around Hadoop Common to ensure a corresponding distributed system, Redoop provides the underlying optimization, system management, and data management to help enterprises build their own tailored data systems.

“Redoop believes that ODPi’s mission of creating a more open and interoperable ecosystem for current and potential Apache Hadoop users is meaningful to the entire big data community,” said XiaoJun Tong, founder of Redoop. “We look forward to joining the initiative and collaborating with its members for the benefit of our customers.”

Xavient is a global IT consulting and software services company, focusing on transforming business ideas into effective solutions. Xavient was among the first five solution and application providers to pledge a commitment to the ODPi Interoperable Compliance Program for its DiP (Data Ingestion Platform) – a real-time data analysis application – ensuring its ability to successfully run on multiple ODPi Runtime Compliant Platforms.

“Xavient is committed to providing customers with tailored capabilities and solution flexibility and making our real-time data analysis solution interoperable with ODPi Compliance Program applications was a natural next step,” said Neeraj Sabharwal, head of cloud, data & analytics at Xavient Information Systems. “We are thrilled to become a member of ODPi to contribute to an open big data ecosystem that transcends big vendor agendas.”

Additional Resources

About ODPi

ODPi is a nonprofit organization committed to simplification and standardization of the big data ecosystem with a common reference specification. As a shared industry effort, ODPi members represent big data technology, solution provider and end user organizations focused on promoting and advancing the state of Apache Hadoop® and big data technologies for the enterprise. For more information about ODPi, please visit:


Media Contact:

Natasha Woods


(415) 312-5289