By: John Mertic, Director of Program Management for ODPi
If you follow ODPi insight closely, you might remember these 2017 Big Data Predictions from our VP of Technology, Roman Shaposhnik. After the start of the new year, I started to think about what his predictions and emerging trends like Big Data’s “Push to the Cloud” might mean for our ecosystem – especially as it relates to the Hadoop landscape.
Last year, Apache Hadoop celebrated its tenth birthday. It was a milestone for the diaspora of the early team at Yahoo! that invented the technology and the worldwide community, along with The Apache Software Foundation that shepherded the growing platform since its launch. However, this decade-iversary also showcased something less obvious than Hadoop’s staying power: it brought to light that the canonical state of Hadoop is breaking apart.
Over the last couple of weeks, I’ve spent a lot of time reading through Hadoop and Big Data landscape articles written in the past few years. The most popular conversation was clearly the expansion of the stack – meaning new projects for every possible nook and cranny of the space. Fast data? Check. 12 ways to perform a SQL slice and dice? Done. AI (artificial intelligence) and ML (machine learning) capabilities? Yup. To see what I mean, take a look at this enormous Hadoop Ecosystem Table – summarizing current Hadoop related projects – here.
Traditionally, the role of Hadoop distribution providers within the ecosystem was to help make sense of a fast-changing and often-confusing landscape for customers. Showcasing their own preferred tools, distros gave the enterprise a stack of components that (more-or-less) worked well together – provided users stayed within confining application architecture walls. While this wasn’t ideal, it worked fairly well if enterprises were happy to stay in the “safe zone” their selected vendors laid out and could blissfully ignore other distros and solutions.
Though this may seem simple, the nature of deploying Big Data is quite varied. Reading through AtScale’s recent “Big Data Maturity” report, 53% of respondents reported using cloud in their deployment but only 14% have all of their data in the cloud. Not to mention Tony Baer’s recent ZDNet article citing that Hadoop in the cloud is a varied product depending upon the provider – and not in the traditional sense with how Cloudera CDP differs from Hortonworks HDP. This emergence of cloud brings into focus a fundamental shift emerging within the entire Big Data landscape.
If there is one overarching lesson the drive to PaaS and IaaS taught us, it would be the benefits of being lean. For example, you can throw more CPU, RAM and disk drives onto your on-premise environment with negligible cost increases; but for cloud instances, each addition counts against you quickly. Knowing this, the best cloud architectures include the ability to compartmentalize, identify focus areas of work and optimize each resource used – as wasting resources on the cloud has in-your-face cost ramifications.
Now combine Hadoop’s push to the cloud with the forced fiduciary responsibility of using cloud resources, and it’s quickly apparent that a traditional one-size-fits-all Hadoop distro is at natural odds – especially when that distro comes with a number of projects and tools that you’ve long-since outgrown.
My biggest prediction for 2017 is that the Hadoop of 2016 is going to become much more modular, special purpose and leaner than what is currently being shipped. We’re are already seeing these trends in the following ways:
- IBM’s Watson Data Platform is centered around Spark – notice anything missing?
- Cloud vendors are moving away from traditional HDFS and, instead, making their native block stores the data lake
- Even traditional Hadoop distro vendors are recognizing this trend and launching offerings leveraging containers as a stopgap solution
This slow elimination of the one-size-fits-all ideal leads me to my second prediction: Hadoop and Big Data will no longer be discussed as their own beings – they’ll instead just be referred to as “Data.” I see this acknowledgment as the separation line between vendors who will be successful in 2017 and those who will not. Connecting the entire landscape story together, and speaking to customers about their data strategy vs. shiny new Hadoop or Big Data products, will separate this year’s data winners from its data losers.
My third prediction for Hadoop: ridding the marketplace of the “traditional Hadoop” baggage, and having the important conversations around data strategy, will employ the needs of traditional business to highlight leading technologies in this space. While this may sound pretty obvious, try answering this: how many traditional businesses are bragging about the efficiency of their Hadoop/Big Data/Data solutions and strategies right now? Not many. However, these businesses know that in order to remain competitive they’ll need to become “data driven.” I think we’ll start seeing organizations drive their needs back to vendors like never before and their successes will be much more prominently showcased. In other words, less focus on Amazon, Netflix and Facebook, and more narratives around companies like Progressive Insurance.
It’s a key year for Big Data as it crosses its biggest chasm yet, but as greater focus comes to this industry I think we’ll start seeing a noticeable push forward – setting up some even more impressive leaps in 2018 and beyond.