Project Frontier: Shaping the Next Generation Hadoop Build Framework of Apache Bigtop

By | Blog

By Evans Ye, Yahoo Taiwan

As a mature Apache top-level project, Apache Bigtop has now been around for 6 years, serving as a critical component for building Hadoop distributions running in production. From on-premises, to big data solution vendors, to cloud providers—Bigtop has been widely leveraged in the big data world.

Yet today that world is growing even more complex. Having started with only a handful of components (HBase, Hive, Pig, Oozie, etc.), the latest release of Bigtop now includes more than 30 components. To handle such complexity, developers need to make sure a patch won’t break components that are integrated together, and release engineers also need to ensure features are fully functional. This is why we initiated Project Frontier, funded by ODPi.

Project Frontier focuses on extending and hardening the feature that Bigtop was originally designed for: building Hadoop distributions. Bigtop can only produce high-quality distributions if working with upstream projects closely to solve integration problems across multiple Hadoop ecosystem projects.

Based on observations to existing Bigtop build frameworks, we set the following goals for Project Frontier:

  1.  Provide a one-stop seamlessly integrated build pipeline
  2.  Document examples as reference implementations
  3.  Create better documentation for iTest, Smoke Tests and the others

These goals are all around one core mission of Project Frontier: Make Bigtop extremely friendly to use. The industry needs a simplified integration test framework for Apache Bigtop. We need a better solution for Apache Bigtop to work with other Hadoop ecosystem projects, with release and integration tests to ensure versions of different projects are working properly with one another.

For example for you, one of the scenarios we’d like to support is that developers can just submit a commit SHA1 which contains newly developed feature, then the framework will handle all the rest to craft an integration test report. That’s how simple it is.

Project Frontier Feature Preview

To tackle these ambitious goals, we will develop the features and functionality of Project Frontier in phases. The initial phase is focusing on improvements to building components in Bigtop. Let’s preview a feature that will be available in the upcoming Bigtop 1.3 release. In Bigtop’s master branch, users will now be able to run the following command under the Bigtop repository to build components.

Let’s say Hadoop:

$ git clone https://github.com/apache/bigtop.git

$ cd bigtop

$ ./gradlew hadoop-pkg-ind

That’s it. Bigtop will take care of the full build environment, and dependencies,  for you. The advantages of this new feature are:

  1.  It abstracts the tedious work that requires direct user attention
  2.  Now grade targets can be streamlined like this:

$ grade hadoop-pkg-ind docker-provisioner, which has hadoop built and deployed as a testing cluster.

We’re still polishing the feature to support more customizations. For example, adding build packages with Nexus server support. Many more features are under development, so share your input and get involved. The Bigtop community welcomes all kinds of contributions from code, to doc, test and discussion—Learn more by visiting our page on GitHub. Join us now to shape the way we are building and integrating the Big Data ecosystem!

 

Evans Ye is a PMC member and former Chair of Apache Bigtop, and leads the Project Frontier initiative for ODPi. He works at Yahoo Taiwan to develop E-Commerce data solutions. Ye loves to code, automate things, and develop big data applications.           

Stay Informed

Sign up for our Newsletter to receive the latest ODPi news and updates.