Category

Blog

Making Production Hadoop Faster, Easier and More Productive: ODPi Member Conversation with zData

By | Blog

In our third ODPi Member Conversation podcast, John Mertic met with zData’s Senior Solution Architect, Gagan Brahmi.

Their in-depth conversation centered around the struggles commercial and enterprise corporations face with Apache Hadoop deployments, along with why standardization is an important driver in the Big Data community.

Brahmi has a unique perspective on Hadoop’s role in the Big Data industry and offered specific insight around the ways today’s enterprises can derive the maximum value out of the Hadoop cluster and their data.

Touching on how the framework’s capabilities have grown tremendously in the last few years, and urging enterprises to recognize how many tools throughout the ecosystem now complement one another, he noted “When we are dealing with [this] tool to make sure we derive the maximum value out of our existing systems, we have to keep an eye on all the tools available.”

To hear more of Gagan and John’s discussion, tune in to the episode on our YouTube channel!

Subscribe to our YouTube channel and follow us on Twitter to catch upcoming episodes of the ODPi Member Conversation podcast series!

Making Production Hadoop Faster, Easier and More Productive: ODPi Member Conversation with Pivotal

By | Blog

In our second ODPi Member Conversation podcast, John Mertic sat down with Pivotal Software’s Head of Data, Jacque Istok.

Their engaging discussion focused on the challenges enterprises face when trying to get value out of their data in Hadoop.

As a founding member of ODPi, and former technologist with a long history in data warehousing and data analytics, Jacque also weighed in on standards and how they enable application interoperability, portability, governance and security across the ecosystem.

Tying the importance of standards into his thoughts around ODPi, he noted “Our vision has always been to make it easier for customers and products/vendors/projects to be able to interact with an enterprise standard for Hadoop in an easy and common way.”

To hear more of Jacque and John’s insight, including why it was important to Pivotal that ODPi was a core organization with a common goal for the Hadoop platform itself, tune in to the episode on our YouTube channel!

Subscribe to our YouTube channel and follow us on Twitter to catch upcoming episodes of the ODPi Member Conversation podcast series!

ODPi 2.1: a “tick” for the future “tock”

By | Blog

By: Roman Shaposhnik, VP of Technology at ODPi

The release of ODPi 2.1 marks five-months worth of the ODPi technical community’s diligent work, though on the surface it may appear to be incremental change to last fall’s 2.0 release. While there aren’t any big, splashy additions to our specification this release is very noteworthy in its own way. Why? Because it follows in the great tradition of tick-tock releases and invests a lot of energy into the underlying infrastructure that is largely invisible to the consumer. This, of course, makes it a “tick” release and those are truly foundational to the success of the follow up “tocks” that get all the excitement. If you still don’t believe tick-tock pairs well with complex systems, ask any Sun microsystems SPARK engineer how well an alternative release model has worked out for them I believe they called it humpty-dumpty, but I digress, so back to ODPi 2.1.

One of the biggest underlying changes in ODPi 2.1 is that we have fully transitioned to leveraging Apache Bigtop for our reference implementation and validation testsuite needs. This required a lot of upstream backporting. Some of it was pretty straightforward, such as backporting all ODPi-developed tests into Bigtop, while some required us to engage with upstream communities and get their feedback on the best way to accomplish a similar goal. This was the story of our ODPi reference implementation stack for Apache Ambari. It started as a custom stack that was shipped as part of the ODPi reference implementation but, after receiving community feedback, it evolved into a standalone management pack that can now be developed and shipped independently of Ambari. This outcome benefits everybody because now any product based on Ambari can simply point at the management pack and deploy ODPi reference implementation.

ODPi 2.1 is our first release consisting of just the specifications. All of the software artifacts are also being released as part of Apache Software Foundation. Such renewed alignment with upstream community efforts allows us to be much more in tune with big data practitioners, regardless of whether they participate in ODPi directly or not. This is a win-win for both ODPi and upstream ASF communities. If Bigtop release 1.2.0 was any indication, ODPi’s focus on enterprise stability and readiness brings to light a lot of issues that would otherwise go unnoticed or would only be fixed in vendor-specific patch releases. ODPi’s Bigtop collaboration brings these issues up closer to the source, creating a feedback loop that results in much faster fixes.

On the flip side, Bigtop’s extensive platform coverage and a vibrant community of ASF developers means the ODPi specification will bring value far beyond what we believe are our core deployment targets. For example, we’ve never really considered IBM’s POWER as a supported ODPi platform, but since Bigtop runs on this hardware, we get it for free. Starting from ODPi 2.1, all of the engineering work will happen directly in the upstream ASF communities, and we expect this to make our development cycle extremely agile and asynchronous. Of course, we’ll continue releasing the specifications, which brings me to the last part of this release.

Most of our effort on the Operations spec was focused on standardizing Ambari 2.5 and taking care of upgrade and backward compatibility guarantees for future ODPi releases. On the Runtime side, we spent quite a bit of time future proofing it against Hive 2.0 (and looking at how known incompatibilities with Hive 1.2 can affect ISVs and end users). We also considered Spark 2.0 as the next component on which to standardize.

New Special Interest Groups Spark Exploratory Developments

Our Spark 2.0 work was interesting in its own right. Our take was that while Spark was still considered experimental and not at the level of maturity that is required of ODPi Core components, it was still highly important to enterprise readiness. We’re tackling this through a loose construct of Special Interest Groups (SIGs), rather than a highly-rigorous body of a Runtime PMC. Thus, Spark gave birth to our first SIG: Spark and Fast Data Analytics SIG.

With the increase in the popularity and usage of Hadoop and Spark, the notion of Spark replacing Hadoop is gaining traction. While this is possible in some use cases, Spark is already part of Hadoop and there are several components from the Hadoop stack on which Spark depends. Our Spark and Fast Data Analytics SIG, led by Pradeep Roy, advisory software engineer at IBM, is expected to publish guidelines for Spark deployment and recommend best practices on Spark and Hadoop use, along with providing guidelines for different deployment methods for Spark on YARN, Mesos or Spark standalone; comparisons of different SQL on Hadoop solutions; and more.

The formation of two new SIGs, Data Security and Governance SIG and BI and Data Science SIG, quickly followed.

Our Data Security and Governance SIG was formed to provide a place for industry experts to collaborate on a set of best practices aimed at solving the complexities of dealing with multi-tenant Big Data data lakes in a secure fashion and with considerations for control points demanded by enterprise regulatory environments and compliance policies. As the leader of this group, my fellow members and I plan to produce a series of whitepapers and validation test suites addressing both platform considerations and solutions practitioners may need to augment their platform practices. This SIG’s first deliverable will be a Security Guide Handbook, developed on GitHub by members from IBM, Hortonworks and Pivotal, that will bring much needed clarity to securing Hadoop-based data lakes infrastructure. We’ve also started working on codifying security-related deployment recommendations as part of the Apache Bigtop deployment capabilities, thus providing baseline functionality around security for the entire Hadoop ecosystem. Stay tuned for our outputs, coming soon!

For our BI & Data Science SIG, according to the group’s champion Cupid Chan, managing partner of 4C Decision, we have a two-fold goal. The first goal is to help bridge the gap between Relational Database Management Systems (RDBMS) and Hadoop so that BI tools can sit harmoniously on top of these systems, while also providing the same, or even more, business insights to the BI users who also use Hadoop in the backend. Another goal is to collaboratively explore ways for Data Science to better leverage the underlying Hadoop ecosystem. In order to attain an achievable result, the first deliverable for this SIG is to develop a “Data Science Notebook Guideline.” Stay tuned for the release of this group’s findings!

While these SIGs are still very young, they are pushing forward important exploratory work that, we hope, will form a basis for some of the future PMCs and specification updates within the broader scope of ODPi.

These SIGs also represent our lowest barrier of entry to date – so, if you feel like contributing to ODPi efforts but don’t know where to start, we encourage you to join an existing SIG or propose a new one.

By default SIGs are using odpi-technical mailing list for all on-line communications between the SIG members. This means that all you have to do to join a SIG is drop an email to the odpi-technical mailing list, introduce yourself and briefly describe why are you interested in the SIG activity. Include your GitHub ID in the introductory email so that a SIG Champion can add you to the GitHub group.

Contributing to the ODPi community is that easy!

Predictable Hybrid Hadoop Blog Series – Crossing the Chasm

By | Blog

In the previous blog in this series, we outlined some of the important ways that running Hadoop in production – especially in enterprise-wide production – differs from point solutions and PoCs.

As a leading downstream community of big data vendors, users and platform providers, ODPi is focused on tackling the security, governance, lifecycle management and application portability needed to run Hadoop at scale.

A classic way to think about technology maturity is Geoffrey Moore’s Chasm model. In our new white paper, we plot key Hadoop milestones against the technology adoption curve (see image below) and argue that the things the ODPi community are focused on are essential to continuing the adoption of this transformative technology.

An adaptation of Everett Roger’s famous S diffusion of innovations curve, the Chasm model argues that users on the left of the chasm are fundamentally different from those on the right. The chasm separates users by adoption trigger/motivation: on the left it’s all about competitive advantage at nearly any cost, on the right it’s about continuity of operations and keeping up with the Joneses.

As awesome as this model is, it has sometimes been co-opted. One way this happens is by applying it to a Product, when in fact it needs to apply to a Category. This is one reason why we are so bullish about our work at ODPi – we explicitly acknowledge that the only way Hadoop and associated Big Data solutions can cross the chasm to mainstream adoption is by working together to define category-wide – NOT vendor-specific – answers to questions like lifecycle management, security and governance, application portability – these are the things that address early and late majority users’ interest in stability and operational continuity.

When thinking about what it really means for a technology to be a platform, we like the way Sam Ghod puts it:

A platform abstracts away a messy problem so you can build on top of it. Platforms do this by delivering portability and extensibility.

With ODPi Releases 1.0 and 2.0 in place, we invited Application Vendors to self-certify that their applications work unmodified across multiple ODPi Runtime Compliance Hadoop Distros. As of this writing, twelve applications from leading vendors like SAS, IBM and DataTorrent have completed the self-certification.

We believe that savvy Enterprise CDOs, CIOs, CTOs and Chief Information Security Officers (CISOs) should carefully consider the platform independence that ODPi’s Interoperable Apps program delivers before making their Hadoop platform choices. If one of your preferred vendors isn’t listed either as an Interoperable App or as a Runtime Compliant Platform, let that vendor know that it matters to you.

In 2017, we’re heads down adding to our existing specifications and creating new workstreams through our Special Interest Groups. We invite you to get involved. If you are a twitter user, be sure to follow @odpiorg and participate in our ongoing polls.

Looking at the latest Gartner Magic Quadrant for Business Intelligence and Analytics Platforms

By | Blog

By John Mertic

I spent some time reviewing the latest Gartner Magic Quadrant for Business Intelligence and Analytics Platforms in preparation for my time at the Gartner Data and Analytics Summit last week. Overall, I’m really excited to see vendors overall scoring higher in ‘Ability to Execute’; Gartner toughly judges this so seeing the general shift upwards is great to see.

While the piece is clearly targeted towards buyers of these tools – I wanted to take a critical eye on the positioning of vendors in relation to their interoperability with Big Data and Hadoop tools. After all, it was a mere decade ago that all of data was covered by a single Gartner analyst. Enter the age of Big Data; with that variability, velocity, and volume has come a cornucopia of products, strategies, and opportunities for answering the data question.

In the same way, BI and Analytics has come from being purely the realm of “data at rest” to become cohesive with “data in motion”. It’s no surprise then to see two “pure play big data” BI vendors, Datameer and ZoomData, joining ClearStory which joined the MQ last year – cementing the enterprise production need of valuable data insights. And with a tip of the hat to the new breed of open source trailblazers such as Hortonworks, they heavily leverage Hadoop and Spark as not just another data source but instead a tool to better process data – letting them focus on their core competency of delivering business insights.

However, what really struck me was the positioning of data governance as a whole in this report – let’s dig into that more.

Data governance and discovery is being pushed farther out

If you’d compare the 2016 report to the 2017 report – you’d immediately notice this line from 2016…

By 2018, smart, governed, Hadoop-based, search-based and visual-based data discovery will converge in a single form of next-generation data discovery that will include self-service data preparation and natural-language generation.

…became…

By 2020, smart, governed, Hadoop/Spark-, search- and visual-based data discovery capabilities will converge into a single set of next-generation data discovery capabilities as components of modern BI and analytics platforms.

Two year delay in just a year is something of note – clearly there is a continual gap in converging the technologies. This aligns with what our members and end-users in our UAB mention as well – the lack of a unified standard here is hurting adoption and investment.

Governance no longer considered a critical capability for a BI vendor

This really stood out to me in light of the point above – is sounds like Gartner believes that governance will need to happen at the data source versus the access point. It’s a clear message that better data management needs to happen in the data lake – we can’t secure at the endpoints for true enterprise production deployment. This again supports the needs of driving standards in the data security and governance space.

I recently sat down with IBM Analytics’ WW Analytics Client Architect Neil Stokes on our ODPi Member Conversations podcast series and the discussion of data lakes was a very present one. To listen to this podcast, visit ODPi Youtube.
I’m reminded of the HL Mencken quote “For every complex problem there is an answer that is clear, simple, and wrong.” Data governance is hard, and not ever going to be something one vendor will solve in a vacuum. That’s why I’m really excited to see the output of both our BI and Data Science SIG and Data Security and Governance SIG in the coming month. Starting the conversation in the context of real world usage, looking at both the challenges and opportunities, is the key to building any successful product. Perhaps this work could be the catalyst for smarter investment and value adds as these platforms continue to grow and become more mature.

Predictable Hybrid Hadoop Blog Series – DataOps Considerations From Lab to Enterprise-wide Production

By | Blog

In last week’s blog, The Hadoop Deployment Continuum, we covered how “in production” actually refers to a very diverse set of deployment scenarios. Anything from a PoC, to point solution, departmental deployment to enterprise-wide production can and often is called “production” use.

This blog focuses on the step-change DataOps requirements that come when you take Hadoop into enterprise-wide production.

As enterprises plan to move Hadoop and Big Data into enterprise-wide production scale out, they face a number of challenges.

Table 1, taken from our recent White Paper, details how running Hadoop and Big Data at enterprise-wide production requires a significant re-think across multiple dimensions.

The good news is that these are the very same challenges that ODPi big data community has been working on for over a year. Through our ODPi Compliance and Interoperable Apps programs, enterprises get stacks that are validated across a number of platforms, providing needed support for  multi-vendor procurement policies. In the words of Gene Banman, CEO of ODPi member DriveScale: “Enterprises have varying big data needs that require flexible and interoperable platform components. Becoming a member of ODPi will allow us to better maximize data center efficiency for Hadoop with interoperability for enterprise-grade deployments.”

Our ongoing work to validate workloads across cloud environments promises to extend ODPi predictability even further.

From a lifecycle management perspective, our Application Installation and Management specification covers requirements and guarantees for custom service specifications and views. Importantly, this spec, like all ODPi specs, is developed in the open and guided by the ODPi Technical Steering Committee (TSC), which is pulled from the entire Big Data industry. ODPi benefits from the involvement of end users, Hadoop platform providers, solution providers, and ISVs.

Last but certainly not least, our Special Interest Groups (SIGs), are looking into the following areas that are key to predictable enterprise-wide operations:

If these things matter to you, we invite you to get involved with any of these SIGs and/or join our slack channel and work with us to co-create a predictable hybrid future for Hadoop.

Predictable Hybrid Hadoop Blog Series – The Hadoop Deployment Continuum

By | Blog

In working on the recent ODPi White Paper, a few things have come into much sharper focus to the team here.

First is that “Production” is a loaded term. Even though you’ve got really good research from places like AtScale reporting that 73% of respondents run Hadoop in production, we think this term needs to be unpacked.

That’s why we worked across our community, including ODPi members and participants in our User Advisory Board, on this Enterprise Hadoop Deployment Continuum graphic.

The very simple idea here is to plot Hadoop deployments from the lab all the way to enterprise-wide production use and to lay against the gates between phases the primary considerations Big Data teams review before taking the next step.

Many of the folks we talk to in our UAB, our membership and at conferences agree that right now, their Hadoop deployments are straddling the last gate, between Point Solution (sometimes these are massive with big business impact and huge volumes of data, but still focused on a single department/application) and looking to go Enterprise-wide. Some folks we’ve talked to even said they could put specific dates on this image when Hadoop has passed through these different phases. Can you?

It’s a very exciting juncture in the history of this amazing technology. Here at ODPi, we are squarely focused on collaborating as an industry to ensure the needed governance, security models and portability are in place to bring about predictable hybrid Hadoop.

In addition to our Runtime and Operations specifications and our ODPi Interoperable Applications program, we are also ushering in greater predictability through the work of our Special Interest Groups (SIGs), any of which we invite you to participate in:

  1. Data security and governance
  2. BI and Data Science
  3. Spark and Fast Data Analytics

These groups bring together downstream consumers of Hadoop and Big Data technologies ( Hadoop Platform Vendors, ISVs/IHVs, Solution Providers, and End-users ) to discuss and provide recommendations to our technical community on the key challenges and opportunities in each area. Participation doesn’t require code contribution – just the contribution of your insights and expertise on how to bring about predictable hybrid Hadoop for the larger Big Data world.

Inside Big Data said it well: “Enterprises that apply Big Data analytics across their entire organizations, versus those that simply implement point solutions to solve one specific challenge, will benefit greatly by uncovering business or market anomalies or other risks that they never knew existed.”  We couldn’t agree more.

The next blog in this series will contrast the operational consideration when running Hadoop in the lab/limited production versus running it enterprise-wide.

Improving Production Hadoop: ODPi Member Conversation with Ampool

By | Blog

Last month, John Mertic sat down for our first ODPi Member Conversation podcast with Milind Bhandarkar, founder and CEO of Ampool.

The exciting discussion centered around the challenges production Hadoop deployments face and how to make the framework faster, easier and more productive.

As he’s spent the last 11+ years working with the various versions of Hadoop – first starting at Yahoo!, where Hadoop was invented – Milind had some interesting context to share with podcast listeners.

After highlighting the changes the space has seen since Hadoop was first introduced to the world, he explained that today’s projects usually “depend on different projects or on different components in the Hadoop ecosystem.”

The importance of interoperability within these offerings, to ensure today’s software-defined companies are able to harness the full power of their data, cannot be understated – as John and Milind agreed that one of Hadoop’s biggest challenges in production has been ensuring that commercial distributions are compatible across multiple components and the applications that have been written to use these components.

To hear more of Milind and John’s expert insight, including more ways to improve production Hadoop, tune in to the episode on our YouTube channel!

Subscribe to our YouTube channel and follow us on Twitter to catch upcoming episodes of the ODPi Member Conversation podcast series!

2017 Predictions: What’s Next for Hadoop

By | Blog

Hadoop

By: John Mertic, Director of Program Management for ODPi

If you follow ODPi insight closely, you might remember these 2017 Big Data Predictions from our VP of Technology, Roman Shaposhnik. After the start of the new year, I started to think about what his predictions and emerging trends like Big Data’s “Push to the Cloud” might mean for our ecosystem – especially as it relates to the Hadoop landscape.

Last year, Apache Hadoop celebrated its tenth birthday. It was a milestone for the diaspora of the early team at Yahoo! that invented the technology and the worldwide community, along with The Apache Software Foundation that shepherded the growing platform since its launch. However, this decade-iversary also showcased something less obvious than Hadoop’s staying power: it brought to light that the canonical state of Hadoop is breaking apart.

Over the last couple of weeks, I’ve spent a lot of time reading through Hadoop and Big Data landscape articles written in the past few years. The most popular conversation was clearly the expansion of the stack – meaning new projects for every possible nook and cranny of the space. Fast data? Check. 12 ways to perform a SQL slice and dice? Done. AI (artificial intelligence) and ML (machine learning) capabilities? Yup. To see what I mean, take a look at this enormous Hadoop Ecosystem Table – summarizing current Hadoop related projects – here.

Traditionally, the role of Hadoop distribution providers within the ecosystem was to help make sense of a fast-changing and often-confusing landscape for customers. Showcasing their own preferred tools, distros gave the enterprise a stack of components that (more-or-less) worked well together – provided users stayed within confining application architecture walls. While this wasn’t ideal, it worked fairly well if enterprises were happy to stay in the “safe zone” their selected vendors laid out and could blissfully ignore other distros and solutions.

Though this may seem simple, the nature of deploying Big Data is quite varied. Reading through AtScale’s recent “Big Data Maturity” report, 53% of respondents reported using cloud in their deployment but only 14% have all of their data in the cloud. Not to mention Tony Baer’s recent ZDNet article citing that Hadoop in the cloud is a varied product depending upon the provider – and not in the traditional sense with how Cloudera CDP differs from Hortonworks HDP. This emergence of cloud brings into focus a fundamental shift emerging within the entire Big Data landscape.

If there is one overarching lesson the drive to PaaS and IaaS taught us, it would be the benefits of being lean. For example, you can throw more CPU, RAM and disk drives onto your on-premise environment with negligible cost increases; but for cloud instances, each addition counts against you quickly. Knowing this, the best cloud architectures include the ability to compartmentalize, identify focus areas of work and optimize each resource used – as wasting resources on the cloud has in-your-face cost ramifications.

Now combine Hadoop’s push to the cloud with the forced fiduciary responsibility of using cloud resources, and it’s quickly apparent that a traditional one-size-fits-all Hadoop distro is at natural odds – especially when that distro comes with a number of projects and tools that you’ve long-since outgrown.

My biggest prediction for 2017 is that the Hadoop of 2016 is going to become much more modular, special purpose and leaner than what is currently being shipped. We’re are already seeing these trends in the following ways:

  • IBM’s Watson Data Platform is centered around Spark – notice anything missing?
  • Cloud vendors are moving away from traditional HDFS and, instead, making their native block stores the data lake
  • Even traditional Hadoop distro vendors are recognizing this trend and launching offerings leveraging containers as a stopgap solution

This slow elimination of the one-size-fits-all ideal leads me to my second prediction: Hadoop and Big Data will no longer be discussed as their own beings – they’ll instead just be referred to as “Data.” I see this acknowledgment as the separation line between vendors who will be successful in 2017 and those who will not. Connecting the entire landscape story together, and speaking to customers about their data strategy vs. shiny new Hadoop or Big Data products, will separate this year’s data winners from its data losers.

My third prediction for Hadoop: ridding the marketplace of the “traditional Hadoop” baggage, and having the important conversations around data strategy, will employ the needs of traditional business to highlight leading technologies in this space. While this may sound pretty obvious, try answering this: how many traditional businesses are bragging about the efficiency of their Hadoop/Big Data/Data solutions and strategies right now? Not many. However, these businesses know that in order to remain competitive they’ll need to become “data driven.” I think we’ll start seeing organizations drive their needs back to vendors like never before and their successes will be much more prominently showcased. In other words, less focus on Amazon, Netflix and Facebook, and more narratives around companies like Progressive Insurance.

It’s a key year for Big Data as it crosses its biggest chasm yet, but as greater focus comes to this industry I think we’ll start seeing a noticeable push forward – setting up some even more impressive leaps in 2018 and beyond.

ODPi Community Lounge @ Apache Big Data Europe

By | Blog

Join the Discussion at the ODPi Community Lounge

Once again ODPi is sponsoring the Community Lounge at Apache Big Data Europe, November 14-16 in Seville, Spain.  Apache project members and speakers are welcome to hold their meetings and after-session discussions.  This is a great way to have a deeper intimate conversation with fellow attendees, and to introduce new potential collaborators to your project

Please choose a time on the Community Lounge Schedule  for your topic or project.  We’ll help promote your upcoming meeting.  Be sure to tell your followers as well.  Time slots are 30 minutes each and can be scheduled on a first come, first served basis.

ODPi Community Lounge – ApacheCon EU 2016

Discussion Schedule

Monday, November 14

Time Speaker or Project Name Topic
10:30
11:00
11:30
12:00
12:30  Apache Giraph – Roman Shaposhnik  Discussion session: Practical Graph Processing with Apache Giraph
13:00
13:30 – 15:30 Lunch 
15:30  Apache MADlib – Roman Shaposhnik  Distributed In-Database Machine Learning with Apache MADlib (incubating) – Roman Shaposhnik, Pivotal
16:00  Apache Geode – Greg Chase  Meet Apache Geode – graduated for Apache Incubator
16:30
17:00

Tuesday, November 15

Time Speaker or Project Name Topic
10:30
11:00
11:30
12:00
12:30
13:00
13:30 – 15:30 Lunch
15:30  Apache Big Top & Greenplum Database – Greg Chase & Roman Shaposhnik Discussion: Massively Parallel Data Warehousing in the Hadoop Stack
16:00
16:30
17:00

Wednesday, November 16

Time Speaker or Project Name Topic
10:30  John Mertic, Director, ODPi and Open Mainframe Project, Linux Foundation  Discussion: Keynote: Lessons from the Trenches: How Apache Hadoop is Being Used & The Challenges Its Users Face –
11:00  ODPi – John Mertic  Discussion: Standardizing data governance across Hadoop distributions
11:30 ODPi – Roman Shaposhnik and John Mertic Discussion: Security in Hadoop
12:00 ODPi – Roman Shaposhnik and John Mertic Discussion: Streaming data in Hadoop
12:30 ODPi Discussion – Roman Shaposhnik Discussion: Hadoop Compatible File Systems across Hadoop Distributions
13:00  ODPi – Alan Gates  Discussion: Standardizing Hive in Hadoop distributions
End of conference