Choi Yeunhyang Catherine, Poppe Katrina, Selak Vanessa, Moffitt Allan Ronald, Chung Claris Yee Seung, Ullmer Jane, Wells Sue
Department of Epidemiology and Biostatistics, University of Auckland, Auckland, New Zealand
Department of Medicine, University of Auckland, Auckland, New Zealand.
BMJ Health Care Inform. 2025 May 22;32(1):e101393. doi: 10.1136/bmjhci-2024-101393.
This study examined whether incorporating free-text entries into structured general practice records improves the detection of long-term conditions (LTCs) and multimorbidity (MM) in New Zealand (NZ) general practices.
Data from 374 071 deidentified individuals in general practices were analysed to identify 61 LTCs. Structured data were extracted using Read codes from a national master list, and clinical raters independently identified condition-related free-text, including synonyms, negation terms and common misspellings in randomised samples. Keywords were categorised and refined through ten iterative tests. Programmatic text classification was developed and assessed against gold-standard clinician ratings, using sensitivity, specificity, positive predictive value (PPV) and F-score.
A quarter of general practitioner classifications contained either unrecognised Read codes or consisted of free-text only. Clinician inter-rater reliability was high (kappa ≥0.9). Compared with clinical gold standard, text classification yielded an average sensitivity of 88%, specificity of 99% and PPV of 95%, with an F-score range of 82%-95%. Incorporating free text increased LTC prevalence from 42.1% to 46.3%, reducing misclassification of MM diagnoses by identifying 12 626 additional patients with MM and 15 972 additional patients with at least one LTC.
In the course of workflow, general practitioners face barriers to accurate LTC coding or may simply annotate with text-based descriptions. Programmatic text classification has demonstrated high performance and identified many more patients receiving LTC care.
Combining structured and unstructured data optimises MM detection in NZ general practices and has the potential to improve case management, follow-up care and allocation of healthcare resources.
本研究探讨将自由文本条目纳入结构化的全科医疗记录是否能改善新西兰全科医疗中对长期病症(LTCs)和多重疾病(MM)的检测。
对来自全科医疗中374071名身份不明个体的数据进行分析,以识别61种长期病症。使用来自国家主列表的Read编码提取结构化数据,临床评估人员在随机样本中独立识别与病症相关的自由文本,包括同义词、否定词和常见拼写错误。通过十次迭代测试对关键词进行分类和细化。开发了程序化文本分类,并根据金标准临床医生评级进行评估,使用敏感性、特异性、阳性预测值(PPV)和F分数。
四分之一的全科医生分类包含无法识别的Read编码或仅由自由文本组成。临床医生之间的评分者信度很高(kappa≥0.9)。与临床金标准相比,文本分类的平均敏感性为88%,特异性为99%,PPV为95%,F分数范围为82%-95%。纳入自由文本使长期病症患病率从42.1%提高到46.3%,通过识别另外12626名患有多重疾病的患者和15972名患有至少一种长期病症的患者,减少了多重疾病诊断的错误分类。
在工作流程中,全科医生在准确进行长期病症编码方面面临障碍,或者可能只是用基于文本的描述进行注释。程序化文本分类已证明具有高性能,并识别出更多接受长期病症护理的患者。
结合结构化和非结构化数据可优化新西兰全科医疗中的多重疾病检测,并有可能改善病例管理、后续护理和医疗资源分配。