Department of Clinical Pharmacy, St Antonius Hospital, Utrecht, the Netherlands.
Division of Pharmacoepidemiology and Clinical Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Utrecht, the Netherlands.
JCO Clin Cancer Inform. 2024 Oct;8:e2400053. doi: 10.1200/CCI.24.00053. Epub 2024 Oct 4.
The objective was to develop and evaluate the portability of a text mining algorithm for prospectively capturing disease progression in electronic health record (EHR) data of patients with metastatic non-small cell lung cancer (mNSCLC) treated with immunochemotherapy.
This study used EHR data from patients with mNSCLC receiving immunochemotherapy (between October 1, 2018, and December 31, 2022) in four Dutch hospitals. A text mining algorithm for capturing disease progression was developed in hospitals 1 and 2 and then transferred to hospitals 3 and 4 to evaluate portability. Performance metrics were calculated by comparing its outcomes with manual chart review. In addition, data were simulated to come available over time to assess performance in real-time applications. Median progression-free survival (PFS) was calculated using the Kaplan-Meier method to compare text mining with manual chart review.
During development and portability, the text mining algorithm performed well in capturing disease progression, with all performance scores >90%. When real-time performance was simulated, the performance scores in all four hospitals exceeded 90% from week 15 after the start of follow-up. Although the exact progression dates varied in 46 patients of 157 patients with progressive disease, the number of patients labeled with progression too early (n = 24) and too late (n = 22) was well balanced with discrepancies ranging from -116 to 384 days. Nevertheless, the PFS curves constructed with text mining and manual chart review were highly similar for each hospital.
In this study, an accurate text mining algorithm for capturing disease progression in the EHR data of patients with mNSCLC was developed. The algorithm was portable across different hospitals, and the performance over time was good, making this an interesting approach for prospective follow-up of multicenter cohorts.
旨在开发和评估一种文本挖掘算法,用于前瞻性地捕捉接受免疫化疗的转移性非小细胞肺癌(mNSCLC)患者的电子健康记录(EHR)数据中的疾病进展。
本研究使用了来自四家荷兰医院接受免疫化疗的 mNSCLC 患者的 EHR 数据(2018 年 10 月 1 日至 2022 年 12 月 31 日)。在医院 1 和 2 开发了一种用于捕捉疾病进展的文本挖掘算法,然后将其转移到医院 3 和 4 以评估可移植性。通过将其结果与手动图表审查进行比较来计算性能指标。此外,还模拟了数据随时间的可用性,以评估实时应用中的性能。使用 Kaplan-Meier 方法计算中位无进展生存期(PFS),以比较文本挖掘与手动图表审查。
在开发和可移植性期间,文本挖掘算法在捕捉疾病进展方面表现良好,所有性能得分均>90%。当模拟实时性能时,在开始随访后的第 15 周,所有四个医院的性能得分均超过 90%。尽管在 157 例进展性疾病患者中有 46 例患者的具体进展日期有所不同,但标记为进展过早(n = 24)和过晚(n = 22)的患者数量差异很大,从 -116 到 384 天不等。尽管如此,使用文本挖掘和手动图表审查构建的 PFS 曲线在每个医院都非常相似。
在这项研究中,开发了一种用于捕捉 mNSCLC 患者 EHR 数据中疾病进展的准确文本挖掘算法。该算法在不同医院之间具有可移植性,并且随时间的性能良好,这为前瞻性随访多中心队列提供了一种有趣的方法。