
The success that ODPi has achieved as a nonprofit organization committed to simplification and standardization of the big data ecosystem is driven by the dedication of our member organizations and individuals. Today we are introducing the first interview in a series of conversations with key ODPi contributors, exploring why they participate in ODPi. We seek to learn more about the individuals whose efforts are accelerating the development of today’s Big Data ecosystem, standards and solutions.
We recently sat down with Scott Gnau, chief technology officer of Hortonworks, to discuss his involvement with ODPi and what value Hortonworks gains from being an active member. Scott has spent his entire career in the data industry, and as Chief Technology Officer for Hortonworks he is currently responsible for the overall technology vision for Hortonworks, overseeing the company’s Engineering, Product Management and Support organizations.
ODPi: Thanks for joining us today Scott. Can you start by sharing a little background on your involvement with ODPI?
Scott: I’ve been involved with ODPi since the founding many years ago, when at the time I worked for a different company that was one of the original founding members of ODPI. Ultimately, the whole notion behind ODPi is to drive a broader sense of community and a level of standardization around some of the core technology in our space. Ultimately that’s a really good thing. If we can make a common kernel for software developers, it makes it easier for them to build their applications and take advantage of the technology that’s being developed as part of the larger Hadoop community. ODPI helps to do it with a level of confidence & compatibility.
ODPi: So has your role been with ODPI? Tell us about how you and Hortonworks have gotten involved with contributions, collaboration and achievements of the organization?
Scott: As for our contributions at Hortonworks, we invest in ODPi not only through the dedicated people who participate, but also certainly with the common code contributions that are donated and then managed by the foundation. Our participation with ODPi is really an extension of how we feel it’s important to invest in the community and to get involved in collaborative involvement in the community.
We’re well known for being an open source company. Our membership with ODPi is not really just about being open source, but also about how we can foster a sense of community that is open, with contributions to the effort coming from all over the industry. That collaboration is what I think helps make our market function properly, and ODPi is one way that represents how Hortonworks invests in that community.
ODPi: What is the real value of the work that ODPi is driving, and what’s the importance of providing a vendor-neutral home for these efforts?
Scott: The big thing is that with any project of this nature, you look at what’s going on in the market and try to respond and/or stay ahead. One of the cool things that we’re working on inside of ODPI now is a much larger focus on the whole notion of data governance, and support of Apache Atlas. When I think about how ODPI has helped the industry, ODPi drove consolidation of all the different distributions of Hadoop and created some standards in the market. That’s a really good thing, from a compatibility perspective as well as support and all those things.
That work largely being completed now, we think the next big thing the industry needs to tackle is data governance. Data governance is still a largely unsolved problem, and that’s with known, structured, transactional kinds of data. The world we live in now—with connected sensors, IoT and big data—brings highly-unknown and highly-variable data that may have even been created outside of the corporate firewall. So the notion of data governance becomes not only more important as we need to know what is the data, where did it come from, who has access to it… but it also becomes a more complicated problem to solve.
We really think that it’s important for Hortonworks to leverage our position in the community and work with the community at large to help create some semblance of standards-based approaches for metadata tagging and governance across this wide variety of data. ODPi is a really great place to do that because it includes multiple different vendors, some of whom compete with each other, coming together to help define standard constructs that can be leveraged across the industry.
ODPi: What excites you personally regarding the technical work being done in ODPi this year?
Scott: The notion of data governance is a big deal. There is a lot of really cool technology emerging, just look at the interesting algorithms being built in deep learning, or the whiz-bang use cases. In the end none of those solutions are sustainable if data is not trusted, and we can’t prove out that lineage. So I’m very excited about the current ODPi focus, governance is a hot button for me. While it may not seem like the sexiest thing out there, it’s the core component that’s going to make all of those interesting use cases viable over the long term.
I couldn’t be more enthusiastic about the value ODPi has created since its inception, and the support we’re getting from ODPi and its member community. From creating a semblance of common kernel components inside the core Hadoop stack, to the much more robust current focus on data governance, ODPI as a non-profit, neutral party is positioned to continue having a great impact.