The state of open source and big data – three years later

By April 4, 2018Blog

Originally posted on DataWorks Summit blog

ODPi turns 3 this year, being first announced at the spring Strata+Hadoop World and brought under the auspice of the Linux Foundation later in the year at the fall Strata+Hadoop World. Hadoop then turned 10 the following year, and seemed to be proclaimed deadthen alive, and then seemingly scrubbed from the world. One might think this meant the nail in the coffin for an organization centered on Hadoop standardization.

The Linux Foundation looks at open source projects in a life cycle, driven by the market needs. A common chart used to describe this is shown below.

In essence, open source foundations such as ODPi invest in developer communities, whose work enables accelerated delivery of new products on the marketplace and cost savings for R&D in organizations. As this produces profits for these organizations, they push investment back into the projects and foundations that support this work. In the present day, open source parlance this practice known as “Managing your Software Supply Chain”. An active cycle here is able to react and adapt to market demands, as well as, take inputs from all stakeholders – developers, implementers, administrators, and end-users.

So, as ODPi started to hit stride in 2016, we talked with people across the data landscape. From these conversations, we quickly saw that big data technology enterprise production adoption numbers were skewed – mostly because of the lack of a solid definition. To better baseline the discussion, we came up with this maturity model on how big data technologies are adopted in the enterprise.

Using this model showed that in 2017, nearly 3/4ths of organizations are still not fully enterprise-wide in deployment of big data. What’s blocking this? Data Governance, a broad and under-invested in area, but one growing more critical by the day with new regulations coming into play along with breakdowns in managing data privacy.

ODPi’s belief is that tackling such a broad issue as Data Governance can only be done with all members of the data ecosystem participating – platform vendors, ISVs, end users, and data governance and privacy experts. This collaboration can only happen in a vendor-neutral space, which is why ODPi has launched a PMC to solely focus on this space.

During Dataworks Summit Berlin, there will be numerous sessions and Meetups around this effort to help you learn more:

We will also be active in the community showcase, where you can chat directly with the experts in this area and learn how to participate in this effort.

Bringing it back to the original question – we are three years into this journey for creating sustainability in big data. We’ve had successes in reducing the numbers of disparate platforms and bringing market awareness to the issues of enterprises adopting these tools. Now the community is poised to take the lessons learned and build a strong community around governance to solidify this practice. Are the challenges different than 3 years ago – absolutely. However, the goal of enterprise adoption remains the same, and with that, we see that big data is becoming more mature, more inclusive, and is building a more collaboratively community.