Advancing Qualitative Analysis: An Exploration of the Potential of Generative AI and NLP in Thematic Coding
54 Pages Posted: 26 Jun 2023
Date Written: June 21, 2023
Abstract
Background: Traditional manual coding in qualitative data analysis can be labor-intensive and time-consuming, especially with large data sets. This research investigates the potential use of natural language processing (NLP) techniques and large language models (LLMs), such as GPT-3.5, to enhance efficiency and depth of insights during the qualitative data coding process.
Method: We compared traditional manual thematic analysis with two NLP-assisted approaches, NLP Cluster Assisted (NLPCA) and NLP with GPT-3.5 (NLPGPT), using a dataset of 3,800 student responses on “exam wrappers” from an engineering physics course. Exam wrappers are structured reflection activities that prompt students to practice self-reflection after they get their graded exams back. Agreement between the methods was evaluated based on the similarity of the generated codes.
Results: Both NLPCA and NLPGPT effectively identified similar themes in the student responses, demonstrating a promising alternative to traditional qualitative coding. Notably, the GPT-3.5 model exhibited strength in producing highly granular codes, which could offer deeper and more nuanced insights.
Discussion: The results of the study underscore the significant benefits of integrating NLP and LLMs into qualitative research. While the study identified challenges such as biases in language models, overfitting in terms of overly granular codes, and resource constraints, the findings suggest these hurdles can be addressed with further research and refinement of the methodology. The application of NLP and LLMs across various research contexts needs validation, setting a promising direction for future studies. This research marks an important stepping stone in enhancing traditional qualitative research with AI technology, paving the way for more scalable, robust, and efficient research methodologies.
Keywords: latural language processing, qualitative analysis, ChatGPT, large language models
Suggested Citation: Suggested Citation