All Posts By

Kristen Evans

Checking In: Apache™ Bigtop “Test Drive” Grant Recipients’ Work in Full Swing

By | Blog

Back in April, with the release of ODPi 2.1, we fully transitioned to leveraging Apache Bigtop for our reference implementation and validation testsuite needs.

Following such a successful changeover, we launched the Apache™ Bigtop “Test Drive” Grant Program – a new grant funding program designed to increase developer involvement in the Apache Software Foundation (ASF) project.

With this program, ODPi has invested $50,000 to fund developer work from some of the world’s top Apache and big data developers and architects to expand Bigtop’s functionality and usability.

The response to the call for proposals was tremendous, and we at ODPi want to thank everyone for your amazing ideas on how to improve Bigtop. After careful deliberation, the Technical Steering Committee (TSC) selected the following two proposals:

  • The “Automatic cluster orchestration and provisioning” project – led by Dr. Konstantin Boudnik, Chief Technologist of BigData & Open Source Fellow at EPAM Systems – will aim to solve the issue of how long cluster provisioning takes, which is unacceptable in the development/testing environments. As this issue affects every developer/quality engineer in need of semi-frequent cluster bring-ups, his progress will be meaningful to all Apache Bigtop users.
  • The “Project Frontier” project – led by Evans Ye, VP of Apache Bigtop & big data engineer at Yahoo! – will aim to provide an easy-to-use integration test framework for Apache Bigtop, as it doesn’t currently work with other Hadoop ecosystem projects closely on releases and integration tests to ensure versions of different projects are working properly one another.

In launching this “Test Drive,” our goal is that the results will benefit enterprise end-users, developers and, ultimately, build a stronger relationship with the Apache Software Foundation and the big data communities hosted by it.

As the program ends on February 2, make sure to come back for individual updates and official outcomes from the developers!

ODPi Takes Big Data Day LA

By | Blog

By Roman Shaposhnik

Earlier this month, I attended Big Data Day LA – a vibrant community gathering of data and technology enthusiasts in sunny Los Angeles. Located on the USC campus, the 5th annual event was organized by local Big Data user groups and volunteers!

Unlike many of the big data industry’s events, Big Data Day LA wasn’t a company-owned conference and registration was fully covered by data-driven sponsors, like Hortonworks, Disney Interactive, WANdisco and more – making the event free for anyone to attend. As such, it wasn’t surprising that it attracted such a big crowd, with more than 1,500 people in attendance!

During the conference, I presented “Big Data on The Rise: Views of Emerging Trends & Predictions from real life end-users” where I offered the audience an overview of key trends emerging in 2017 within the Hadoop and Big Data ecosystem. My session also covered data from the ODPi End User Advisory Board (TAB) and real end-user perspectives on how companies are using Big Data tools, challenges they face and where they are looking to focus investments. My talk was well received by those in attendance and quite a few people approached me following the session to discuss their new understanding of ODPi and how it relates to traditional vendors, the Apache Software Foundation, the Linux Foundation and the enterprise.

The remainder of the conference featured a great selection of talks, especially for data scientists and software developers and, unsurprisingly, the entertainment industry track was a huge hit – featuring talks from Netflix, Warner Brothers, Guitar Center and more.

And, of course, being a Silicon Valley guy I simply had to check out all the startup buzz at the conference as well. Not only did the conference feature an awesome Startup Showcase track, but there were also quite a few presentations pushing the envelope on state-of-the-art machine learning. Just to give you a quick taste, I suggest you go to http://novamente.ai/ and check out the truly SciFi projects these guys are tackling. Their presentation on how to apply AI to producing ever higher grossing movies (think scripts, casting, visual effects and more) had the audience on the edge of our seats.  

A few other sessions I particularly enjoyed include one from Bain & Company where they tried to put big data and machine learning in the context of organizational shifts required in any traditional enterprise, in order to realize full value from big data insight. On the flip side, even if you have your organization all lined up for digital transformation, you still have to be mindful of challenges on the technical side of machine learning. The notion of a hidden, growing technical debt in these complex, end-to-end machine learning pipelines is something that we all should keep in mind and Irina Kukuyeva’s presentation did a great job of highlighting some of the same important areas.

Overall, Big Data Day LA was a fun and dynamic event that harnessed the upstream community and showcased the importance data has on each level of business.

 

 

Making Production Hadoop Faster, Easier and More Productive: ODPi Member Conversation with Zettaset

By | Blog

In our fifth ODPi Member Conversation podcast, John Mertic spoke with Zettaset’s Director of Product Management, Sesh Ramaswami, and CMO, John Armstrong.

Their multi-perspective discourse covered off on the security challenges of deploying Hadoop in production, data governance and security expectations, along with how nonprofit organizations like OASIS and ODPi can help drive standards for the big data ecosystem.

Touching on big data security within production environments, Ramaswami dove right into enterprise expectations around data governance and data processing.

Referring to the role data governance plays in the Hadoop ecosystem, Armstrong noted that security wasn’t always a concern, as data was in isolation away from production.

However, the space has changed drastically over the years. Touching on this shift, Armstrong said, “When [enterprises] finally want to move into production, and want to take those data environments and put them into the mainstream… we need to have some real security in place before you plug into the rest of the network —  because Hadoop coexists in large enterprise environments in conjunction with relational database environments, maybe other NoSQL, so it’s not alone and it has to play well with others.”

To hear more of the group’s insight, tune in to the episode on our YouTube channel!

Subscribe to our YouTube channel and follow us on Twitter to catch upcoming episodes of the ODPi Member Conversation podcast series!

ODPi Launches Apache Bigtop Grant Fund Program

By | Announcements

Bigtop “Test Drive” Grant Program to further enterprise-wide production of Apache Hadoop

SAN FRANCISCO, June 13, 2017 – ODPi, a nonprofit organization accelerating the open ecosystem of big data solutions, today announced Apache™ Bigtop “Test Drive” Grant Program, a new grant funding program designed to increase developer involvement in the Apache Software Foundation (ASF) project. Through the program, ODPi is investing $50,000 to fund developer work with the world’s top Apache and big data developers and architects to expand Bigtop’s functionality and usability.

To apply to participate in the Bigtop “Test Drive” Grant Program, submit proposals here by Friday, July 14 at 11:59pm PST.

“By launching this program, we benefit the enterprise end-user, developers, and ultimately build a stronger relationship with the Apache Software Foundation and the big data communities hosted by it,” said Roman Shaposhnik, VP of Technology for ODPi and Apache Hadoop and Bigtop committer and ASF member. “We encourage the community to apply for a Bigtop grant and work alongside ODPi member companies IBM, SAS, Hortonworks, Splunk, GE and others as we strengthen and extend Apache Bigtop.”

Apply to Apache Bigtop “Test Drive” Grant Program

The ODPi Technical Steering Committee (TSC) is looking for Apache Hadoop ecosystem developers and/or big data practitioners building solutions for in-house or external clients to help further the features in Bigtop. This might be software development, developing new teaching materials, documenting best practices, standardizing APIs or doing research. The program will allow developers to hone their skills, build relationships with leading big data companies and earn monetary compensation. Beyond financial support, the TSC can also provide administrative support, promotion and some collaboration tools.

The program will run for six months, beginning August 1, 2017 and ending February 2, 2018. All participants will be given a dedicated mentor from the TSC and will be required to report their progress to the TSC on a monthly basis (first Thursday of each month). The TSC reserves the right to remove participants from the program.

Applicants will need to write a two to four-page proposal that describes the Bigtop problem they want to solve and the funding needed to solve the problem. The TSC will review all proposals and accepted program applicants will be announced Monday, July 31, 2017. Details regarding the proposal, including the submission process, can be found at here.

The ODPi investment means a lot to the Bigtop community,” said Evans Ye, Bigtop’s PMC Chair. “It marks a new milestone that the project not only supports distro vendors, but also, at a higher level, enterprises looking to increase their use of hybrid big data. I believe we’ll have great synergy because ODPi and Bigtop are both committed to making the big data ecosystem more open, connected and relevant.”

A comprehensive packaging, testing, and configuration suite of the leading open source big data components, Apache Bigtop supports a wide range of components/projects, including Hadoop, HBase and Spark. Grants will be awarded for work in the following functional areas: building, continuous integration (CI), testing, deployment, supported platform coverage, and list of supported big data components to expand the platform and set of tools for building standardized big data deployments. By enhancing automation and CI, extending the testing functionality and improving deployment, Bigtop will provide the big data operational predictability that enterprises require.

“In order to confidently expand Apache™ Hadoop® to enterprise-wide production use, businesses need to know that their preferred big data stacks will run predictably and Apache Bigtop can provide this confidence,” said John Mertic, Director, ODPi. “By launching Bigtop “Test Drive” Grant Program, ODPi is helping to add Bigtop functionality that will make the project even more valuable to enterprise end-users.”

Additional Resources

About ODPi

ODPi is a nonprofit organization committed to simplification and standardization of the big data ecosystem with a common reference specification. As a shared industry effort, ODPi members represent big data technology, solution provider and end user organizations focused on promoting and advancing the state of Apache Hadoop® and big data technologies for the enterprise. For more information about ODPi, please visit: http://www.ODPi.org

Making Production Hadoop Faster, Easier and More Productive: ODPi Member Conversation with IBM

By | Blog

In our fourth ODPi Member Conversation podcast, John Mertic met with IBM’s Worldwide Analytics Architect Leader for Data Lakes, Neil Stokes.

Their insight-rich conversation kicked off with a discussion around machine learning and cognitive vision, along with how an ODPi Compliant Hadoop environment relates to cognitive computing.

Stokes, who has spent more than 20 years working at IBM, has a unique perspective on how Hadoop and cognitive systems afford enterprises the ability to ingest and make sense of the rawest possible forms of data, and why systems are only as good as the corpus of data to which it has access.

Referring to these sets of broad, invaluable data, Stokes noted “Interoperability is key here… Groups like ODPi – who have the ability to facilitate that level of interoperability across vendors, across platforms, across different technologies – there is a tremendous value that groups like ODPi bring [to the ecosystem].”

To hear more of Stokes and John’s discussion, tune into the episode on our YouTube channel!

Subscribe to our YouTube channel and follow us on Twitter to catch upcoming episodes of the ODPi Member Conversation podcast series!

Making Production Hadoop Faster, Easier and More Productive: ODPi Member Conversation with zData

By | Blog

In our third ODPi Member Conversation podcast, John Mertic met with zData’s Senior Solution Architect, Gagan Brahmi.

Their in-depth conversation centered around the struggles commercial and enterprise corporations face with Apache Hadoop deployments, along with why standardization is an important driver in the Big Data community.

Brahmi has a unique perspective on Hadoop’s role in the Big Data industry and offered specific insight around the ways today’s enterprises can derive the maximum value out of the Hadoop cluster and their data.

Touching on how the framework’s capabilities have grown tremendously in the last few years, and urging enterprises to recognize how many tools throughout the ecosystem now complement one another, he noted “When we are dealing with [this] tool to make sure we derive the maximum value out of our existing systems, we have to keep an eye on all the tools available.”

To hear more of Gagan and John’s discussion, tune in to the episode on our YouTube channel!

Subscribe to our YouTube channel and follow us on Twitter to catch upcoming episodes of the ODPi Member Conversation podcast series!

Storage Review: “News Bits: NETGEAR, NAKIVO, Dell, Linux, ODPi, Cisco, Scality, iboss, Microsemi, & More”

By | News

This week’s News Bits we look at a number of small announcements, small in terms of the content not the impact they have. NETGEAR introduces new consumer WiFi system. NAKIVO releases version 7 with support for Hyper-V. Dell releases its lightest, smallest, and most power-efficient entry-level thin client. The Linux Foundation and ODPi launch a free intro to Apache Hadoop courseREAD MORE.

Database Trends and Applications: “Riding Two Major Apache Hadoop Trends: Cloud and Standards”

By | News

By: Mike Maciag and John Mertic

Over the past year, we have witnessed a sea of change in how enterprises are thinking about Apache Hadoop and big data. Just 12 months ago, a majority of enterprises were still committed to building their own on-premises Apache Hadoop implementations and were agonizing over which distribution to select, weighing which constellation of applications would run with that particular distribution, since there were no standards for Apache Hadoop. Today, a majority of enterprises are thinking about the cloud first, not on-premises, and are increasingly relying on ecosystem standards to drive their Apache Hadoop distribution selection. READ MORE.

ODPi 2.1: a “tick” for the future “tock”

By | Blog

By: Roman Shaposhnik, VP of Technology at ODPi

The release of ODPi 2.1 marks five-months worth of the ODPi technical community’s diligent work, though on the surface it may appear to be incremental change to last fall’s 2.0 release. While there aren’t any big, splashy additions to our specification this release is very noteworthy in its own way. Why? Because it follows in the great tradition of tick-tock releases and invests a lot of energy into the underlying infrastructure that is largely invisible to the consumer. This, of course, makes it a “tick” release and those are truly foundational to the success of the follow up “tocks” that get all the excitement. If you still don’t believe tick-tock pairs well with complex systems, ask any Sun microsystems SPARK engineer how well an alternative release model has worked out for them I believe they called it humpty-dumpty, but I digress, so back to ODPi 2.1.

One of the biggest underlying changes in ODPi 2.1 is that we have fully transitioned to leveraging Apache Bigtop for our reference implementation and validation testsuite needs. This required a lot of upstream backporting. Some of it was pretty straightforward, such as backporting all ODPi-developed tests into Bigtop, while some required us to engage with upstream communities and get their feedback on the best way to accomplish a similar goal. This was the story of our ODPi reference implementation stack for Apache Ambari. It started as a custom stack that was shipped as part of the ODPi reference implementation but, after receiving community feedback, it evolved into a standalone management pack that can now be developed and shipped independently of Ambari. This outcome benefits everybody because now any product based on Ambari can simply point at the management pack and deploy ODPi reference implementation.

ODPi 2.1 is our first release consisting of just the specifications. All of the software artifacts are also being released as part of Apache Software Foundation. Such renewed alignment with upstream community efforts allows us to be much more in tune with big data practitioners, regardless of whether they participate in ODPi directly or not. This is a win-win for both ODPi and upstream ASF communities. If Bigtop release 1.2.0 was any indication, ODPi’s focus on enterprise stability and readiness brings to light a lot of issues that would otherwise go unnoticed or would only be fixed in vendor-specific patch releases. ODPi’s Bigtop collaboration brings these issues up closer to the source, creating a feedback loop that results in much faster fixes.

On the flip side, Bigtop’s extensive platform coverage and a vibrant community of ASF developers means the ODPi specification will bring value far beyond what we believe are our core deployment targets. For example, we’ve never really considered IBM’s POWER as a supported ODPi platform, but since Bigtop runs on this hardware, we get it for free. Starting from ODPi 2.1, all of the engineering work will happen directly in the upstream ASF communities, and we expect this to make our development cycle extremely agile and asynchronous. Of course, we’ll continue releasing the specifications, which brings me to the last part of this release.

Most of our effort on the Operations spec was focused on standardizing Ambari 2.5 and taking care of upgrade and backward compatibility guarantees for future ODPi releases. On the Runtime side, we spent quite a bit of time future proofing it against Hive 2.0 (and looking at how known incompatibilities with Hive 1.2 can affect ISVs and end users). We also considered Spark 2.0 as the next component on which to standardize.

New Special Interest Groups Spark Exploratory Developments

Our Spark 2.0 work was interesting in its own right. Our take was that while Spark was still considered experimental and not at the level of maturity that is required of ODPi Core components, it was still highly important to enterprise readiness. We’re tackling this through a loose construct of Special Interest Groups (SIGs), rather than a highly-rigorous body of a Runtime PMC. Thus, Spark gave birth to our first SIG: Spark and Fast Data Analytics SIG.

With the increase in the popularity and usage of Hadoop and Spark, the notion of Spark replacing Hadoop is gaining traction. While this is possible in some use cases, Spark is already part of Hadoop and there are several components from the Hadoop stack on which Spark depends. Our Spark and Fast Data Analytics SIG, led by Pradeep Roy, advisory software engineer at IBM, is expected to publish guidelines for Spark deployment and recommend best practices on Spark and Hadoop use, along with providing guidelines for different deployment methods for Spark on YARN, Mesos or Spark standalone; comparisons of different SQL on Hadoop solutions; and more.

The formation of two new SIGs, Data Security and Governance SIG and BI and Data Science SIG, quickly followed.

Our Data Security and Governance SIG was formed to provide a place for industry experts to collaborate on a set of best practices aimed at solving the complexities of dealing with multi-tenant Big Data data lakes in a secure fashion and with considerations for control points demanded by enterprise regulatory environments and compliance policies. As the leader of this group, my fellow members and I plan to produce a series of whitepapers and validation test suites addressing both platform considerations and solutions practitioners may need to augment their platform practices. This SIG’s first deliverable will be a Security Guide Handbook, developed on GitHub by members from IBM, Hortonworks and Pivotal, that will bring much needed clarity to securing Hadoop-based data lakes infrastructure. We’ve also started working on codifying security-related deployment recommendations as part of the Apache Bigtop deployment capabilities, thus providing baseline functionality around security for the entire Hadoop ecosystem. Stay tuned for our outputs, coming soon!

For our BI & Data Science SIG, according to the group’s champion Cupid Chan, managing partner of 4C Decision, we have a two-fold goal. The first goal is to help bridge the gap between Relational Database Management Systems (RDBMS) and Hadoop so that BI tools can sit harmoniously on top of these systems, while also providing the same, or even more, business insights to the BI users who also use Hadoop in the backend. Another goal is to collaboratively explore ways for Data Science to better leverage the underlying Hadoop ecosystem. In order to attain an achievable result, the first deliverable for this SIG is to develop a “Data Science Notebook Guideline.” Stay tuned for the release of this group’s findings!

While these SIGs are still very young, they are pushing forward important exploratory work that, we hope, will form a basis for some of the future PMCs and specification updates within the broader scope of ODPi.

These SIGs also represent our lowest barrier of entry to date – so, if you feel like contributing to ODPi efforts but don’t know where to start, we encourage you to join an existing SIG or propose a new one.

By default SIGs are using odpi-technical mailing list for all on-line communications between the SIG members. This means that all you have to do to join a SIG is drop an email to the odpi-technical mailing list, introduce yourself and briefly describe why are you interested in the SIG activity. Include your GitHub ID in the introductory email so that a SIG Champion can add you to the GitHub group.

Contributing to the ODPi community is that easy!

App Developer Magazine: Get a free intro to Apache Hadoop course

By | News

The Linux Foundation, the nonprofit advancing professional open source management for mass collaboration, today announced its newest massive open online course (MOOC) is available for registration. The course, LFS103x – Introduction to Apache Hadoop, is offered through edX, the nonprofit online learning platform launched in 2012 by Harvard University and Massachusetts Institute of Technology (MIT). This free course will begin in early June. READ MORE.