Files

Abstract

Abstract:

Social scientists are embracing the idea of using `text as data' as a way to quantify and evaluate social theories. I'll discuss a brief history of how this strategy has worked and evolved, and pitch some new approaches for combining social measurement with state-of-the-art natural language processing. We'll focus on the massive multinomial regression models that serve as a basis for text analysis and the distributed computing strategies that allow inference on truly Big Data. I'll then work through a number of examples of social science questions being asked and answered via statistical NLP, with data from online reviews on Yelp, the US congressional record, and communications between buyers and sellers on eBay.

Details

PDF