Text Classification for Organizational Researchers: A Tutorial

baby-84626_1920[We’re pleased to welcome author Dr. Stefan Mol of the University of Amsterdam. Dr. Mol recently published an article in Organizational Research Methods entitled “Text Classification for Organizational Researchers: A Tutorial,” which is currently free to read for a limited time. Below, Dr. Mol reflects on the inspiration for conducting this research:]

07ORM13_Covers.inddWhat motivated you to pursue this research?
Machine Learning assisted text analysis is still uncommon in organizational research, although its use holds promise. Most manual text analysis procedures conducted by researchers in this field are about the assignment of text to categories such as in thematic and template analyses. However, manual classification of text becomes laborious and time consuming (and sometimes subject to reliability issues) when one needs to do this for a sizeable amount (hundreds of thousands or millions) of pieces of text. An alternative is to use automatic text classification systems that can be constructed by researchers, which allow them to speed up the process of labeling or coding large sets of textual data. The design and building of text classifiers could be of use for various areas of organizational research. Our aim was to illustrate how this could be done and provide a tutorial. We used the example of building a text classifier to automatically sort job type information contained in job vacancies. The importance of validating the results of text classification was demonstrated through data triangulation, using expert input. We believe that the use of this procedure among organizational researchers can improve reliability and efficiency in analysis that involves classification.
What has been the most challenging aspect of conducting your research? Were there any surprising findings?
Building classifiers involves several rounds of training, testing, and validation before they can be deployed in practice and the most challenging aspect is training the classifier and choosing the parameters in such a way that the results are valid from the standpoint of application. The classifier we built for the job analysis task was able to recover job task sentences with high precision as assessed by an expert in the field, although the classifier was initially trained with minimum expert input. Our results thus suggest that job vacancies are a reliable alternative source of job information that can augment existing approaches to job analysis. More generally, we believe this also suggests that wider use of text classification holds promise for organizational research in a broader sense.
What did not make it into your published manuscript that you would like to share with us?
One class of techniques that are now increasingly applied in the area of text classification are word embeddings. Word embeddings map each word to vectors of real numbers. The similarities among word vectors can be used to quantify and categorize the meaning of words in specific contexts. We initially planned to include a short discussion about this but we decided not to because these techniques warrant more in depth discussion which go beyond the scope of our current article. However, organizational researchers interested in recovering context specific meaning of words may benefit from the specific approach taken with word embeddings and we recommend them to get to know these techniques as well.

Stay up-to-date with the latest research from Organizational Research Methods and sign up for email alerts today through the homepage!