In-Depth

Got Big Data Skills? Write Your Own Ticket

The ongoing Big Data skills shortage is providing great opportunities for developers with Hadoop and related experience.

More on this topic:

What the heck are you doing reading this article? You should be boning up on your Big Data developer skills. Well, if you like making the big bucks, that is.

Yes, the Big Data skills shortage shows no signs of shrinking even after several years of hype. That means great opportunities for data developers.

"By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions," stated a recent McKinsey Global Institute report.

And where there's hype, there's money. "Salaries reported by those who regularly use Hadoop, NoSQL, and Mongo DB are all north of $100,000," claimed a recent report from the 2013-2012 Dice Salary Survey.

That's borne out by the Indeed.com salary search tool, which shows "Hadoop" jobs coming in at $109,000, and "Hadoop Software Engineer" averaging $124,000. That's compared to "software engineer" at $94,000.

Of course, it all comes down to supply and demand, and some interesting tools to track such information can be found at Stack Overflow, a resource well-known to developers (for example, a recent Slashdot.org posting was titled "Developers May Be Getting 50 Percent of Their Documentation From Stack Overflow").

I asked Stack Overflow about this issue and got the following response from Bethany Marzewski, marketing coordinator: "I ran a quick query through our database of 106,000 developer profiles (worldwide) and found that of these, less than 1 percent (only 951) have listed Hadoop as one of their technologies.

"Comparatively, of the 1,589 job listings on our job board, a search for 'Big Data' returns 776 open roles--nearly 50 percent," Marzewski said. "A query for jobs seeking programmers with Hadoop experience yields 90 open jobs (nearly 6 percent), and a search for 'machine learning' yields 115 open roles." You can view the site's job board to run your own queries.

And developers are taking note, looking for that very documentation mentioned in the Slashdot posting. Take a look at Figure 1, provided by Stack Overflow developer Kevin Montrose, who ran a query to chart questions on the site tagged with "Hadoop" for each month since 2008.

Stack Overflow Hadoop Questions Over Time
[Click on image for larger view.]
Figure 1. Stack Overflow Hadoop Questions Over Time

But what else can developers do to position themselves to become highly paid Big Data rock star programmers? Well, for one thing, maybe don't focus on the "programmer" part so much.

"The ability to apply technology and functions to business use cases is very important in Big Data," Joe Nicholson, vice president of marketing at Big Data vendor Datameer Inc., said. "This includes understanding data types and what insights they can provide, especially when they are correlated with other data types. Historically, programmers often knew little about what their companies did, they just focused on technology. Increasingly so, companies are looking for a combination of business and technical skills."

In addition to those with business skills, companies are also looking for a new breed of data specialist, the data scientist, rather than traditional software engineers. This distinction was emphasized by Mark Herschberg, CTO at Madison Logic, which provides data-powered lead solutions for companies. He's forming a data science team in New York, and he is finding there is more of a shortage of data scientists than engineers.

"A software engineer can't simply become a data scientist in the same way a Java developer can become a Ruby developer," Herschberg said. "A good data scientist has a combination of three different skills: data modeling, programming and business analysis. The data modeling is the hardest. Most candidates have a master's degree or Ph.D in math or science and have worked with various statistical models. They have programming skills--not so much the type to let you build a scalable enterprise system, but in that they can access the database and move data around. They are probably better at R and sci py (a type of Python) than at building a Web application. They also are familiar with tools like Hadoop and NoSQL databases. Finally they have some basic business sense so will know how to ask meaningful business questions of the data.

"If a software engineer is serious about moving into data science, he or she should probably begin by taking some classes in advanced statistics and data modeling," Herschberg said.

Bill Yetman, senior director of engineering at Ancestry.com, offered another take on the ideal Big Data candidate. "Developers need to approach new technologies and their careers with a 'learning mindset,' Yetman said. "Always be willing to pick up something new, embrace it and master it. Developers who love to learn will always stay up to date and be marketable."

And of course, experts agree that hands-on experience is invaluable--along with demonstrating that experience. In fact, participating in developer forums and related question-and-answer sites or code repositories is becoming more important for companies looking to find the best developers--using, of all things, Big Data analysis.

Jon Rooney, director of developer marketing at Big Data vendor Splunk Inc., concurs. "There's no substitute for hands-on experience," Rooney said. "Developers who show experience by writing code and posting their work on places like GitHub are always marketable."

And to get comfortable writing that code, Syncsort Inc.'s Steven Totman, advised developers to "download a VMware version and start playing and get some real experience using a cluster (for example, fire up a few nodes on Amazon.)" Totman, director of strategy at the data-related software vendor, also emphasized the importance of credentials. "But to stand out from the crowd," Totman said, "it's critical to take training and get certified. All of the major Hadoop distributors--Cloudera, Hortonworks, MapR--offer training and certification courses."

Cloudera Inc., in fact, recently announced an academic partnership "to provide accredited, nonprofit universities access to Cloudera's industry-leading products to help streamline and accelerate adoption of Hadoop."

Other training opportunities abound on the Web. "Coursera has some excellent Machine Learning courses that we've used to get our entire dev team literate in this area," said Will Cole, product manager at Stack Overflow. "We've had some people go to meetups (usually through meetup.com) around these topics. And of course, the more concrete way to market yourself is to build side projects or contribute to open source projects where you can take what you've learned and show some working production results you've achieved."

On the other side of the Big Data skills shortage are companies looking to hire--the demand side of the supply-and-demand equation. They're increasingly finding new and innovative ways to find talent beyond the traditional methods.

Some companies that can't find suitable Big Data talent through job sites and external recruiting might be advised to look inward. Yetman said, "We identify talented developers within the organization who want to work on the Big Data projects and train them. This takes time and needs to be factored into the project. For some key positions (that is, data scientists), we have hired contractors instead of full time employees."

Totman noted that "companies seem to be getting smarter and trying to hunt lower in the food chain. They are recognizing that good core logical processing skills ingrained in math/science graduates can be repurposed and it's as important to have people who know data as it is to learn the new tooling.

"Companies are also realizing that existing skills in their data warehouse teams, especially ETL, can be repurposed. These guys already know the data--how to move and transform--it they just need training/tooling that can take advantage of Hadoop," Totman said.

But looking in-house doesn't always work. "Many companies seem to be looking outside of their in-staff talent to find Big Data developers, which involves everything from running external hackathons to engaging at local meetups," Rooney said.

Cole noted that companies are "becoming more proactive in their search. Job listings alone won't cut it. Employers are getting more sophisticated in how they filter candidates on our system [the Stack Overflow Candidate Search database] and those like it (LinkedIn)," he said. "They're also resorting to old-fashioned networking at meetup events. If you attend one, you'll notice that nearly all of these events are being sponsored by companies looking to hire data scientists and are hoping to meet someone at the event."

When other methods are exhausted, some companies look overseas, relying on another former, overhyped, IT industry fad-of-the-moment: outsourcing. In fact, one recent article was titled, "The Big Data Talent Shortage: Are H1-B Visa Holders the Solution?" Yetman noted the use of that controversial strategy. "Companies are looking outside the United States to find developers," he said. "There are talented Big Data developers in Europe, Russia and elsewhere that are willing to come to the U.S. Companies are sponsoring H-1Bs to bring these developers over and work in the U.S. on Big Data projects."

Herschberg, on the front lines in the hunt for data scientists and engineers, attributed the problem to the U.S. education system. "It should be noted that many candidates I see are foreign born, since American education isn't emphasizing math skills to our youth," he said.

Meanwhile, the industry has been hard at work making Big Data analytics more http://adtmag.com/articles/2013/05/09/big-data-product-watch.aspx accessible and usable to companies of all sizes. There have been a plethora of recent announcements about customized Hadoop distributions, Big Data appliances, specialized applications and other mechanisms to bring the technology to the corporate masses.

"For those that can't afford the robust data scientist departments of Netflix and Amazon, this is the most viable option," said Omer Trajman, vice president of field operations at WibiData, which sells customized Big Data applications.

So you can expect more announcements of such products, as the Big Data wave doesn't look to be cresting anytime soon.

"The success of companies like Facebook, LinkedIn, Google and Ancestry.com with Big Data is fueling the adoption of these technologies," Yetman noted. "Big Data technologies are moving from an early adopter phase to an early majority phase. Companies recognize that their data can provide a competitive edge. The lack of qualified engineers is slowing companies down, but they are still embracing the change and moving forward."

However, that movement forward may come at great cost, observed Matt Mueller, president of CBIG Recruiting & Staffing, headquartered in Rosemont, Ill. "Instead of training existing staff, companies would rather go out and hire experience," Mueller said. "Since Hadoop is relatively new, the number of folks with Hadoop is scare and the ones you do have the skills are paid well, aggressively recruited and can command compensation that most companies don't want to pay."

The gravy train for Big Data and Hadoop specialists won't last forever, though. "Just like any other tool or technology, something will replace Hadoop and the demand will decrease as more people do pick up the tool," Mueller said.

So, for goodness' sake, close that browser and get to work, already!