What Are the Most-Wanted Data Science Skills for 2016?
Data science crowdsourcing specialist CrowdFlower Inc. set out to determine exactly what skills are most in demand by employers seeking to hire data scientists.
In today's Big Data-crazy world, data scientists might be the most-coveted workers in the IT industry, pretty much able to write their own ticket and work wherever they want in the midst of an ongoing skills shortage. Nevertheless, certain skills make potential candidates even more desirable to employers willing to shell out big bucks for the best talent.
To find the most-wanted data science skills, CrowdFlower analyzed job postings on the business-oriented social media site, LinkedIn.
"The answer to our question, then, is get your SQL on," said CrowdFlower's Justin Tenuto in a bog post titled "What skills should data scientists have in 2016?" yesterday.
[Note: CrowdFlower has followed up on this information with a complete report on data scientists.]
Yes, despite the strong association with NoSQL databases that marked early Big Data efforts based on Apache Hadoop technology, the industry has seen a steady growth in demand for the mainstay query language for RDBMSes. With a large SQL-skilled labor force already in place, SQL-on-Hadoop initiatives have been increasing rapidly of late. In fact, NoSQL came in far behind SQL, at a surprising No. 8. Perhaps even more surprising, SQL was the only skill found to be listed on more than half of the analyzed job postings.
Not so surprising are the other most in-demand skills, starting with Hadoop itself.
"Hadoop, Python, Java, and R round out our top five in-demand skills," Tenuto said. "It's worth noting that we didn't ask about Excel skills and that's still something you see in myriad job listings. Old habits die hard."
Java, of course, is a de facto standard for working with Hadoop, being used to write Hadoop itself. It was also just named "Programming Language of the Year" for 2015 by TIOBE Software.
The R programming language, with its strength in statistics, has seen a resurgence in use concurrent with the Big Data boom, climbing the ranks in language popularity indices.
Python, meanwhile, "is rapidly gaining mainstream appeal as a hybrid of R's fast, sophisticated data mining capability, and a more practical language to build products," according to an article on Fast Company. "Python is intuitive and easier to learn than R, and its ecosystem has grown dramatically in recent years, making it more capable of the statistical analysis previously reserved for R."
The CrowdFlower report mostly jives with a similar study
The high demand for expertise in these languages and other technologies such as MapReduce, Hive, Pig and so on have made the profession of data scientist "the sexiest job of the 21st century," according to a 2012 article in the Harvard Business Review.
A data scientist, according to HBR, "is a high-ranking professional with the training and curiosity to make discoveries in the world of Big Data. The title has been around for only a few years. (It was coined in 2008 by one of us, D.J. Patil, and Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook.) But thousands of data scientists are already working at both start-ups and well-established companies. Their sudden appearance on the business scene reflects the fact that companies are now wrestling with information that comes in varieties and volumes never encountered before."
CrowdFlower, which farms out different parts of data-related jobs to a network of contributors who work on small parts of an overall project -- such as categorizing raw data points, for example -- came up with its list by examining nearly 3,500 job postings on LinkedIn. "We looked at postings for data scientists and analysts at every level -- from intern to associate to senior to VP -- then asked our contributors to visit each posting and note which common skills were present in each," Tenuto said.
Lukas Biewald, CrowdFlower's CEO, offered the following observations to ADTmag based on the findings:
- It's interesting that familiarity with data storage is at the top of the desired skills -- Hadoop and SQL. This is probably due to the fact that data storage technologies are more consolidated than applications or programming languages.
- Clearly anyone wanting to get into data science needs to learn about databases. With all of the hype around machine learning, sometimes we forget that you can't do anything without data.
- Python was the most popular programming language among companies looking for data scientists even though it's not the first tool you think of when it comes to scientific computing. This reflects the fact that data scientists are spending more and more of their time enriching their data rather than analyzing it.
Whatever they're spending their time doing, they're getting well paid for it.
According to a report from the federal Bureau of Labor Statistics, "computer and information research scientists" received a median salary of $108,360 per year in 2014. They also have pretty good job security, with the position forecast to grow at a rate of 11 percent through 2024, significantly higher than the average forecasted growth rate of 7 percent for all occupations.
David Ramel is the editor of Visual Studio Magazine.