Text-Mining Research Helps Mobile App Developers Address Issues

Statisticians at Cornell University have developed a new text-mining approach that can reportedly help mobile app developers sort out app reviews to more quickly zero in on problems that need immediate attention.

The approach combines the aggregation and parsing of customer reviews into one step to more quickly provide actionable insights culled from feedback. Cornell said the approach improves upon the usual Bayesian modeling methodology by using a model based on weighted averages of words that appear in reviews.

That is said to simplify the common practice of analyzing text with large matrices of such words, resulting in ungainly, super-wide representations that are shrunk down with the new methodology.

"The idea was, can you devise a method that would look through all the ratings, and say these are the topics people are unhappy about and this is maybe where a developer should focus," said Shawn Mankad, assistant professor of operations, technology and information management in the Samuel Curtis Johnson Graduate School of Management.

He is the lead author of "Single Stage Prediction with Embedded Topic Modeling of Online Reviews for Mobile App Management" -- available here -- which will appear in an upcoming issue of the Annals of Applied Statistics. The paper's co-authors are Cornell doctoral candidate Shengli Hu and Anandasivam Gopal of the University of Maryland.

The model basically provides a weighted average of words that appear in online reviews, with each weighted average representing a topic of discussion. In addition to providing guidance on a single app's performance, the method is said to allow comparison to competing apps over time to benchmark features and consumer sentiment.

The introduction to the paper states:

We create a supervised topic modeling approach for app developers to use mobile reviews as useful sources of quality and customer feedback, thereby complementing traditional software testing. The approach is based on a constrained matrix factorization that leverages the relationship between term frequency and a given response variable in addition to co-occurrences between terms to recover topics that are both predictive of consumer sentiment and useful for understanding the underlying textual themes.

The new approach was tested on simulated and real data, using more than 100,000 reviews of 162 versions of online travel agent apps, whereupon it was found to perform better than standard methods of forecasting accuracy. That, in turn, reportedly helps organizations determine how frequently new app versions are released.

"In text mining, there is a super popular class of methods based on Bayesian modeling," Mankad said. "The field can get dogmatic about what technique to use. In this paper, we're doing something different by trying a matrix factorization method. To me, it's OK to try a new method when you think it may have an advantage in certain situations."

About the Author

David Ramel is an editor and writer at Converge 360.