Suppr超能文献

开发并实现一种用于从电子健康记录中捕获 IV 期非小细胞肺癌患者疾病进展情况的文本挖掘算法。

Development and Portability of a Text Mining Algorithm for Capturing Disease Progression in Electronic Health Records of Patients With Stage IV Non-Small Cell Lung Cancer.

机构信息

Department of Clinical Pharmacy, St Antonius Hospital, Utrecht, the Netherlands.

Division of Pharmacoepidemiology and Clinical Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Utrecht, the Netherlands.

出版信息

JCO Clin Cancer Inform. 2024 Oct;8:e2400053. doi: 10.1200/CCI.24.00053. Epub 2024 Oct 4.

Abstract

PURPOSE

The objective was to develop and evaluate the portability of a text mining algorithm for prospectively capturing disease progression in electronic health record (EHR) data of patients with metastatic non-small cell lung cancer (mNSCLC) treated with immunochemotherapy.

METHODS

This study used EHR data from patients with mNSCLC receiving immunochemotherapy (between October 1, 2018, and December 31, 2022) in four Dutch hospitals. A text mining algorithm for capturing disease progression was developed in hospitals 1 and 2 and then transferred to hospitals 3 and 4 to evaluate portability. Performance metrics were calculated by comparing its outcomes with manual chart review. In addition, data were simulated to come available over time to assess performance in real-time applications. Median progression-free survival (PFS) was calculated using the Kaplan-Meier method to compare text mining with manual chart review.

RESULTS

During development and portability, the text mining algorithm performed well in capturing disease progression, with all performance scores >90%. When real-time performance was simulated, the performance scores in all four hospitals exceeded 90% from week 15 after the start of follow-up. Although the exact progression dates varied in 46 patients of 157 patients with progressive disease, the number of patients labeled with progression too early (n = 24) and too late (n = 22) was well balanced with discrepancies ranging from -116 to 384 days. Nevertheless, the PFS curves constructed with text mining and manual chart review were highly similar for each hospital.

CONCLUSION

In this study, an accurate text mining algorithm for capturing disease progression in the EHR data of patients with mNSCLC was developed. The algorithm was portable across different hospitals, and the performance over time was good, making this an interesting approach for prospective follow-up of multicenter cohorts.

摘要

目的

旨在开发和评估一种文本挖掘算法,用于前瞻性地捕捉接受免疫化疗的转移性非小细胞肺癌(mNSCLC)患者的电子健康记录(EHR)数据中的疾病进展。

方法

本研究使用了来自四家荷兰医院接受免疫化疗的 mNSCLC 患者的 EHR 数据(2018 年 10 月 1 日至 2022 年 12 月 31 日)。在医院 1 和 2 开发了一种用于捕捉疾病进展的文本挖掘算法,然后将其转移到医院 3 和 4 以评估可移植性。通过将其结果与手动图表审查进行比较来计算性能指标。此外,还模拟了数据随时间的可用性,以评估实时应用中的性能。使用 Kaplan-Meier 方法计算中位无进展生存期(PFS),以比较文本挖掘与手动图表审查。

结果

在开发和可移植性期间,文本挖掘算法在捕捉疾病进展方面表现良好,所有性能得分均>90%。当模拟实时性能时,在开始随访后的第 15 周,所有四个医院的性能得分均超过 90%。尽管在 157 例进展性疾病患者中有 46 例患者的具体进展日期有所不同,但标记为进展过早(n = 24)和过晚(n = 22)的患者数量差异很大,从 -116 到 384 天不等。尽管如此,使用文本挖掘和手动图表审查构建的 PFS 曲线在每个医院都非常相似。

结论

在这项研究中,开发了一种用于捕捉 mNSCLC 患者 EHR 数据中疾病进展的准确文本挖掘算法。该算法在不同医院之间具有可移植性,并且随时间的性能良好,这为前瞻性随访多中心队列提供了一种有趣的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9881/11469628/aaa5f735848d/cci-8-e2400053-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验