McKenzie Jordan, Rajapakshe Rasika, Shen Hua, Rajapakshe Shan, Lin Angela
Northern Medical Program, Faculty of Medicine, University of British Columbia, Prince George, BC, Canada.
Medical Physics, BC Cancer, Kelowna, BC, Canada.
JMIR Med Inform. 2021 Nov 12;9(11):e29241. doi: 10.2196/29241.
Health research frequently requires manual chart reviews to identify patients in a study-specific cohort and examine their clinical outcomes. Manual chart review is a labor-intensive process that requires significant time investment for clinical researchers.
This study aims to evaluate the feasibility and accuracy of an assisted chart review program, using an in-house rule-based text-extraction program written in Python, to identify patients who developed radiation pneumonitis (RP) after receiving curative radiotherapy.
A retrospective manual chart review was completed for patients who received curative radiotherapy for stage 2-3 lung cancer from January 1, 2013 to December 31, 2015, at British Columbia Cancer, Kelowna Centre. In the manual chart review, RP diagnosis and grading were recorded using the Common Terminology Criteria for Adverse Events version 5.0. From the charts of 50 sample patients, a total of 1413 clinical documents were obtained for review from the electronic medical record system. The text-extraction program was built using the Natural Language Toolkit Python platform (and regular expressions, also known as RegEx). Python version 3.7.2 was used to run the text-extraction program. The output of the text-extraction program was a list of the full sentences containing the key terms, document IDs, and dates from which these sentences were extracted. The results from the manual review were used as the gold standard in this study, with which the results of the text-extraction program were compared.
Fifty percent (25/50) of the sample patients developed grade ≥1 RP; the natural language processing program was able to ascertain 92% (23/25) of these patients (sensitivity 0.92, 95% CI 0.74-0.99; specificity 0.36, 95% CI 0.18-0.57). Furthermore, the text-extraction program was able to correctly identify all 9 patients with grade ≥2 RP, which are patients with clinically significant symptoms (sensitivity 1.0, 95% CI 0.66-1.0; specificity 0.27, 95% CI 0.14-0.43). The program was useful for distinguishing patients with RP from those without RP. The text-extraction program in this study avoided unnecessary manual review of 22% (11/50) of the sample patients, as these patients were identified as grade 0 RP and would not require further manual review in subsequent studies.
This feasibility study showed that the text-extraction program was able to assist with the identification of patients who developed RP after curative radiotherapy. The program streamlines the manual chart review further by identifying the key sentences of interest. This work has the potential to improve future clinical research, as the text-extraction program shows promise in performing chart review in a more time-efficient manner, compared with the traditional labor-intensive manual chart review.
健康研究经常需要人工查阅病历,以确定特定研究队列中的患者并检查其临床结局。人工查阅病历是一个劳动密集型过程,需要临床研究人员投入大量时间。
本研究旨在评估一个辅助病历查阅程序的可行性和准确性,该程序使用用Python编写的基于规则的内部文本提取程序,以识别接受根治性放疗后发生放射性肺炎(RP)的患者。
对2013年1月1日至2015年12月31日在不列颠哥伦比亚癌症协会基洛纳中心接受2-3期肺癌根治性放疗的患者进行回顾性人工病历查阅。在人工病历查阅中,使用《不良事件通用术语标准》第5.0版记录RP诊断和分级。从50例样本患者的病历中,从电子病历系统中总共获取了1413份临床文档进行查阅。文本提取程序是使用自然语言工具包Python平台(以及正则表达式,也称为RegEx)构建的。使用Python 3.7.2版本运行文本提取程序。文本提取程序的输出是包含关键术语、文档ID和提取这些句子的日期的完整句子列表。本研究将人工查阅的结果用作金标准,并将其与文本提取程序的结果进行比较。
50%(25/50)的样本患者发生了≥1级RP;自然语言处理程序能够确定其中92%(23/25)的患者(敏感性0.92,95%CI 0.74-0.99;特异性0.36,95%CI 0.18-0.57)。此外,文本提取程序能够正确识别所有9例≥2级RP患者,这些患者具有临床显著症状(敏感性1.0,95%CI 0.66-1.0;特异性0.27,95%CI 从0.14-0.43)。该程序有助于区分有RP的患者和无RP的患者。本研究中的文本提取程序避免了对22%(11/50)的样本患者进行不必要的人工查阅,因为这些患者被确定为0级RP,在后续研究中无需进一步人工查阅。
这项可行性研究表明,文本提取程序能够辅助识别接受根治性放疗后发生RP的患者。该程序通过识别感兴趣的关键句子进一步简化了人工病历查阅。这项工作有可能改善未来的临床研究,因为与传统的劳动密集型人工病历查阅相比,文本提取程序在更高效地进行病历查阅方面显示出了前景。