I had a good day last week presenting to the audience at the Global Big Data Summit in Santa Clara. The tail end of the the last day of any conference is a bit slow, but was thrilled when many came barreling in right as I was ready to start working through my slidedeck which spoke to the point of the importance of standards, like ODPi, in driving future investment in Big Data and Apache Hadoop.
I had one critical question after the talk that I thoroughly enjoyed answering. A gentleman pushed back on my point that standards need to be the focus. In his experience, staff training and education were the biggest concerns and it didn’t make sense to focus on standards until a critical mass of developers and practitioners were properly trained first. It was a fair argument, and one that Gartner has found as a key blocker to Apache Hadoop growth as well, but to me one that tries to treat the symptom more than the core issue, and I pushed back saying that standards enables better education and enablement. My point made sense to him, but I walked away wanting to discuss this more in a blog with better data points behind it. After all, we are in the data industry here and should be data driven!
If there is one industry where standards are at the forefront, it’s education. Education standards are a very touchy subject (disclaimer here – I’m a parent of 4 school aged children and good friends with several educators) and while I’ll attempt to steer clear of the execution for this article, the concept of what is trying to be driven makes perfect sense. Does what skills a first grader have in one state equate with another state? What are reasonable benchmarks for defining competency? Can trends in learning/teaching method and outcomes be better correlated?
I came across an interview with a leader in educational standards entitled “How and Why Standards Can Improve Student Achievement: A Conversation with Robert J. Marzano”. The interviewee made some interesting insights which drew parallels to the critical question I received at the talk. Here’s a few quotes from the interview and their relation to Apache Hadoop standards:
“Standards hold the greatest hope for significantly improving student achievement. Every other policy mandate we’ve tried hasn’t done so. For example, right after A Nation at Risk (Washington, DC: U.S. Department of Education, 1983) was published, we tried to increase academic achievement by making graduation requirements more rigorous. That was the first wave of reform, but it didn’t have much of an effect.”
This makes a great point – creating a measuring stick for competency without some sort of standard to base education from hurts more than it helps.
The interviewer goes on to ask about what conditions are needed to implement standards.
“Cut the number of standards and the content within standards dramatically. If you look at all the national and state documents that McREL has organized on its Web site (www.mcrel.org), you’ll find approximately 130 across some 14 different subject areas. The knowledge and skills that these documents describe represent about 3,500 benchmarks. To cover all this content, you would have to change schooling from K–12 to K–22. Even if you look at a specific state document and start calculating how much time it would take to cover all the content it contains, there’s just not enough time to do it. So step one toward implementing standards is to cut the amount of content addressed within standards. By my reckoning, we would have to cut content by about two-thirds. The sheer number of standards is the biggest impediment to implementing standards.”
Lots and lots of content and things identified to learn across a diverse set of subject areas, with a finite time to turn out individuals competent in the space. Seem similar to the situation in the Apache Hadoop ecosystem?
The interviewer then follows up with asking how can you do this with knowledge continuing to expand.
“It is a hard task, but not impossible. So far the people we’ve asked to articulate standards have been subject matter specialists. If I teach music and my life is devoted to that, of course I’m going to believe that all of what’s identified in the national documents is important. Subject matter experts were certainly the ones to answer the question, What’s important in your content area? To answer the question, What’s absolutely essential? you have to broaden that population dramatically to include all constituents—those with and without college degrees.”
This response aligns very well with the ODPi approach to creating Apache Hadoop standards. We aren’t in the business of creating full end-to-end comprehensive standards of what an Apache Hadoop Platform should offer, or an Apache Hadoop-Native Big Data Application should adhere to, but instead focus on what’s truly important to provide that base level – those essential pieces for what a platform should offer. And I particularly like the last point “expanding the scope of the conversation around standard to get diverse opinions and experiences,” which is something ODPi is uniquely positioned to drive.
One last quote, which I think shapes the “Why?” on this effort.
“Whether we focus on standards or not, we’re entering an era of accountability that has been created by technology and the information explosion.”
The enterprise has the same expectations – they want to lower risks in Big Data investments, which those risks are a byproduct of not having staff to manage them. Fortune 500 executives need this in place to have any confidence in this technology, which the abysmal adoption rates have shown to be problematic. In short, Apache Hadoop needs to be accountable for its enterprise growth.