Ward Emma, Naughton Felix, Belderson Pippa, Papakonstantinou Trisevgeni, Ainsworth Ben, Hanson Sarah, Notley Caitlin, Bondaronek Paulina
Faculty of Medicine and Health Sciences, University of East Anglia, Norwich, UK.
Department for Experimental Psychology, University College London, London, UK.
Br J Health Psychol. 2025 Sep;30(3):e70017. doi: 10.1111/bjhp.70017.
Investigate the use of machine learning to expedite thematic analysis of qualitative data concerning factors that influenced health behaviours and wellbeing during the COVID-19 pandemic.
Qualitative investigation using Machine-Assisted Topic Analysis (MATA) of free-text data collected from a prospective cohort.
Free-text survey data (2177 responses from 762 participants) of influences on health behaviours and wellbeing were collected among UK participants recruited online, using Qualtrics at 3, 6, 12 and 24 months after the COVID-19 pandemic started. MATA, which employs structural topic modelling (STM), was used (in R) to discern latent topics within the responses. Two researchers independently labelled topics and collaboratively organized them into themes, with 'sense checking' from two additional researchers. Plots and rankings were generated, showing change in topic prevalence by time. Total researcher time to complete analysis was collated.
Fifteen STM-generated topics were labelled and integrated into six themes: the influences of and impacts on (1) health behaviours, (2) physical health (3) mood and (4) how these interacted, partly moderated by (5) external influences of control and (6) reflections on wellbeing and personal growth. Topic prevalence varied meaningfully over time, aligning with changes in the pandemic context. Themes were generated (excluding write-up) with 20 h combined researcher time.
MATA shows promise as a resource-saving method for thematic analysis of large qualitative datasets whilst maintaining researcher control and insight. Findings show the interconnection between health behaviours, physical health and wellbeing over the pandemic, and the influence of control and reflective processes.
研究如何利用机器学习来加速对定性数据的主题分析,这些定性数据涉及在新冠疫情期间影响健康行为和幸福感的因素。
采用机器辅助主题分析(MATA)对从一个前瞻性队列收集的自由文本数据进行定性调查。
在新冠疫情开始后的3个月、6个月、12个月和24个月,通过Qualtrics在线招募英国参与者,收集关于健康行为和幸福感影响因素的自由文本调查数据(来自762名参与者的2177份回复)。使用采用结构主题建模(STM)的MATA(在R语言中)来识别回复中的潜在主题。两名研究人员独立标记主题,并共同将其组织成主题,另外两名研究人员进行 “合理性检查”。生成了图表和排名,展示了主题流行度随时间的变化。整理了研究人员完成分析的总时间。
标记了15个由STM生成的主题,并将其整合为六个主题:对(1)健康行为、(2)身体健康、(3)情绪的影响及其相互作用,部分由(5)外部控制影响和(6)对幸福感和个人成长的反思所调节。主题流行度随时间有显著变化,与疫情背景的变化一致。研究人员总共花费20小时(不包括撰写报告)生成了这些主题。
MATA作为一种节省资源的方法,在对大型定性数据集进行主题分析的同时,保持了研究人员的控制权和洞察力,显示出了前景。研究结果表明了疫情期间健康行为、身体健康和幸福感之间的相互联系,以及控制和反思过程的影响。