Stack Overflow's New Data Trove Shows Devs Who Indent with Spaces Earn More Money

Stack Overflow, the go-to Q&A site for many programmers, has released the raw results from its huge developer survey, letting users mine the treasure trove of data for insights such as: Those who indent with spaces earn more money than those who indent with tabs.

That may be a reach, but Stack Overflow's David Robinson seems to have proven the correlation -- if not causality -- in a blog post he penned yesterday to announce the release of data from the company's recent "Stack Overflow 2017 Developer Survey."

To illustrate what can be done with the CSV data set collected from some 64,000 developers -- reportedly the largest such survey ever conducted -- Robinson put his R programming language skills to work the tabs-vs.-spaces "holy war." Though his conclusion was surprising, he controlled for several factors that could have skewed the result and backed up his research with a GitHub project post that details his methodology -- complete with code in an R markdown file.

He used a linear regression that predicted salary based on these factors:

  • Tabs vs spaces
  • Country
  • Years of programming experience
  • Developer type and language (for the 49 responses with at least 200 "yes" answers)
  • Level of formal education (for example, bachelor's, master's, doctorate)
  • Whether they contribute to open source
  • Whether they program as a hobby
  • Company size

Eventually his model estimated that using spaces instead of tabs resulted in an 8.6 percent higher salary.

"So ... this is certainly a surprising result, one that I didn't expect to find when I started exploring the data," Robinson said. "And it is impressively robust even when controlling for many confounding factors. As an exercise I tried controlling for many other confounding factors within the survey data beyond those mentioned here, but it was difficult to make the effect shrink and basically impossible to make it disappear."

Investigating Programming Language Correlations
[Click on image for larger view.] Investigating Programming Language Correlations (source: Stack Overflow)

Perhaps somewhat surprisingly, this isn't the first time the weighty question of tabs vs. spaces has been addressed through data analysis. Last year, Google developer advocate Felipe Hoffa showed off the capabilities of the company's cloud-based BigQuery data warehouse by analyzing some 1 billion files across 400,000 GitHub repositories to see if developers prefer tabs or spaces to indent their code. They preferred spaces.

Also, Stack Overflow's 2015 developer survey found 45 percent of respondents preferred tabs, while 33.6 percent preferred spaces.

So while the question of which method is preferred remains open based on these differing conclusioins, there now seems to be statistically valid evidence that developers who prefer spaces make more money than their tabbing brethren.

"Correlation is not causation, and we can never be sure that we've controlled for all the confounding factors present in a dataset," Robinson said in concluding his post. "If you're a data scientist, statistician, or analyst, I encourage you to download the raw survey data and examine it for yourself. In any case we'd be interested in hearing hypotheses about this relationship."

About the Author

David Ramel is an editor and writer for Converge360.