So You Want To Become a Data Scientist?
So you want to become a data scientist? And why wouldn't you? Careers site Glassdoor just named it the "best job in America" for 2016. So we decided to talk to a real-life data scientist to find out what the job's all about and how to become one.
But first, what makes it the No. 1 job in the entire country? Well, you can start with the median base salary of $116,840, partly attributable to the prolonged dearth of available data scientists caused by the Big Data craze spawning from the advent of the Apache Hadoop ecosystem. It's so hard finding these elusive creatures that many enterprise Big Data initiatives are being held back, depriving companies of what could be a crucial competitive advantage.
Then you can consider the number of data scientist job openings, which totaled 1,736 at the time of Glassdoor's blog post last week. Glassdoor's methodology also included scores for "Career Opportunity," which combine with the other metrics to come up with an overall "Job Score," which was topped by data scientists.
But what exactly do data scientists do?
According to Wikipedia, "Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics, similar to Knowledge Discovery in Databases (KDD)."
According to IBM, "A data scientist represents an evolution from the business or data analyst role. The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math."
Those definitions are all well and good, but we decided to talk to a real data science working in the front-line trenches, so to speak, to find out what the job is really all about, and how aspiring data scientists can work their way into the profession.
For that, we solicited input from Mark Schwarz, vice president of Data Science at Square Root Inc., an Austin, Texas-based Software-as-a-Service (SaaS) company that leverages data science and agile development to build enterprise class technology solutions, according to its site. Schwarz earned a B.S. in computer science from Texas A&M University and worked his way up to his current title through a procession of internships and developer positions, becoming vice president of data science at Square Root in 2014.
Following is an e-mail Q&A with Schwarz, who discusses his day-to-day responsibilities and offers advice for those wishing to break into the profession:
- How is a data scientist different from a data developer?
A data scientist turns information into actionable next steps -- a forecast, a cleaner classification or a statement. A data developer turns bits into information. A data developer's work is just as valuable, more common, and a whole lot less visible than a data scientist's.
- How does a data scientist spend most of the working day?
A data scientist spends most of his/her working day communicating. Only about 10 percent of the time is spent on what most people think of when they think "data science" -- the model selection and training. 50/40/10 splits -- 50 percent feature engineering, 40 percent feature selection and 10 percent model selection -- like the one from Gert Jacousse are pretty common. In a commercial setting (or in a competitive one like Kaggle), feature creation and feature selection are very interactive. A data scientist will spend lots of time tuning their opinions, their stories and their models to match reality.
- What tools or software do you work with the most?
On the Square Root team, Python, R and SQL are common.
- What are some interesting projects you've been working on?
Lately, we've been working on clustering retail stores by market opportunity. We're using the clusters to drive targeted profit and operational improvements in our store relationship management (SRM) platform, CoEFFICIENT. It's fun!
- What's the biggest challenge you've faced in data science?
The biggest challenge is to measure the change that comes from data. Data is well-placed to change perspectives. To prove its value, it needs to change behaviors, too. It's always difficult to measure how well a particular data set actually changed a person's behavior, but that's what's most interesting to me.
- What's the most important trait of a successful data scientist?
A successful data scientist is curious, creative, a skilled technologist and a clear communicator.
- What do you think of all the industry initiatives to put Big Data analytics into the hands of "ordinary business users," as opposed to highly skilled and trained data scientists or developers?
I think it's admirable, although not well-defined. The "science" part of data science involves being disciplined enough to ask and answer the right questions. "Big Data analytics" can encompass that rigor. When that term is used, though, scientific rigor is often missing or under-emphasized.
- What do you see as the best way to meet the heavy demand for data scientists?
The best way to meet heavy demand is to keep talking about it! More illuminating stories will build stronger and stronger support for healthy data science organizations.
- What advice do you have for aspiring data scientists?
Do data science. Joel Grus emphasizes this in his recent book ["Data Science from Scratch: First Principles with Python," O'Reilly Media, 2015]. Start with a question you're personally curious about and then dig in. If that means generating your own data, do it! When a person has a thoughtful portfolio of things they've tried in the past, I'm always interested to read what they've tried. In this still-nascent industry, work examples are a powerful way of communicating how capable you are.
- Where do you see data science going in the future?
Up and to the right.
- Bonus question: What did you do before Hadoop (and what will you do after)?
Before Hadoop I started with good hypotheses to test. We'll still be doing that after it's passé. Technology changes quickly. Other technologies, such as Facebook's Presto (which is loosely Hadoop/HDFS-based) are already circling.
Other benefits of the profession should also be taken into account, such as work-life balance, and data scientists also came in No. 1 in Glassdoor's evaluation last fall of occupations that offer the best such balance. Coincidentally, since that October report, the reported salary has gone up a couple thousand dollars, while the number of job openings has decreased by a few hundred.
Nevertheless, the outlook for data scientist job opportunities is likely to keep growing, according to the Occupational Outlook Handbook published by the Bureau of Labor Statistics unit of the U.S. Department of Labor.
"Employment of computer and information research scientists is projected to grow 11 percent from 2014 to 2024, faster than the average for all occupations," the BLM said. "The research and development work of computer and information research scientists turns ideas into industry-leading technology. As demand for new and better technology grows, demand for computer scientists will grow as well."
For more on the specific skills sought by employers, read the recent article, "What Are the Most-Wanted Data Science Skills for 2016?.
David Ramel is the editor of Visual Studio Magazine.