Are we There yet? Thematic Analysis, NLP, and Machine Learning for Research




Thematic analysis, Natural Language Processing, Machine learning, Qualitative analysis, Comparative review


Thematic analysis is a well-established technique for qualitative analysis which is covered in traditional research methods training. The objective of thematic analysis is to elicit themes and significant topics from discursive data such as free style discussions and semi structured or unstructured interviews or comments. The approach is laborious and time consuming and requires a significant input from researchers for identifying and coding the themes although software tools such as NVivo, T-Lab and IRaMuTeQ can aid with results presentation. Recent developments in Machine Learning (ML) and Natural Language Processing (NLP) have boosted interest in text analytics and its applications to social science research. For example, automatic topic identification using ML NLP offers valuable insights in social media analytics. However, machine learning techniques conventionally rely on large data sets to enable the algorithm to elicit themes. More recent research efforts have turned to the performance of machine learning approaches with smaller data sets. This study aims to compare and contrast the effectiveness of Machine Learning NLP vs human generated themes using the text analytics tools NVivo, T-Lab, IRaMuTeQ, as well as the low-code ML tool KNIME for automatically eliciting themes from academic literature review in the contexts of service operations management research and semi-structured customer interviews. Results indicate that the ML NLP approach has the potential to automatically detect research themes even with small data sets, although the results vary across the different tools and are dependent on the capabilities of the built-in text analytic algorithms. In particular, T-Lab offered the best mapping of machine learning derived topics to researcher themes, and KNIME proved the most robust software, able to derive meaningful topics even with very small sample sizes. The implications for training research students are also significant as they suggest that the inclusion of ML NLP tools and algorithms in the training curriculum of social scientists may be beneficial.