Suppr超能文献

自然语言处理算法在电子健康记录中识别野生型异柠檬酸脱氢酶胶质瘤。

Natural language processing algorithms identify wild-type isocitrate dehydrogenase gliomas in electronic health records.

作者信息

Forrest Noah, Guggilla Vijeeth, Bell April, Zelisko Susan, Federico Emma M, Power Erica A, Birch Steven, Nandoliya Khizar R, Houskamp Ethan J, Tran Steven, Lukas Rimas V, Johnson Jodi L, Roy Ishan, Wainwright Derek A, Walunas Theresa L

机构信息

Institute for Artificial Intelligence in Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.

Informatics and Systems Development, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA.

出版信息

Neurooncol Adv. 2025 Jun 7;7(1):vdaf111. doi: 10.1093/noajnl/vdaf111. eCollection 2025 Jan-Dec.

Abstract

BACKGROUND

In 2021, the World Health Organization reclassified glioblastomas to include only gliomas with wild-type isocitrate dehydrogenase (IDHwt). Reclassification has created a challenge for retrospective identification of patients with GBM, as many were classified with outdated definitions. This study aims to address this challenge by using natural language processing (NLP) on electronic health record (EHR) data to identify patients with wild-type IDH glioma.

METHODS

We manually adjudicated a subset of 1499 pathology records for evidence of IDHwt glioma as well as the methylation status of the MGMT promoter. We then trained several regularized logistic regression models that identify the IDH mutation and MGMT promoter status using biomedical concepts identified in the text. These models were then validated at a second site. Kaplan-Meier curves stratifying patients by their MGMT promoter methylation status and other clinical variables were constructed for further cohort characterization.

RESULTS

The best-performing model for identifying IDHwt glioma displayed an F1 measure of 0.88. Comparing patients with methylated and unmethylated MGMT promoter showed significant differences in median overall survival times ( < 0.001). Finally, the best-performing IDHwt glioma identification model displayed an F1 measure of 0.962 when implemented at a secondary site.

DISCUSSION

Our results suggest that we can identify patients with IDHwt glioma in pathology notes in the EHR using NLP. Our models displayed excellent performance at a secondary healthcare institution, demonstrating that they can identify multi-site GBM cohorts. Furthermore, our characterization of the NM GBM cohort recapitulated known survival trends, demonstrating the utility of EHR data in studying GBM in clinical settings.

摘要

背景

2021年,世界卫生组织对胶质母细胞瘤进行了重新分类,仅将异柠檬酸脱氢酶野生型(IDHwt)的胶质瘤纳入其中。重新分类给胶质母细胞瘤患者的回顾性识别带来了挑战,因为许多患者是按照过时的定义进行分类的。本研究旨在通过对电子健康记录(EHR)数据使用自然语言处理(NLP)来识别野生型IDH胶质瘤患者,以应对这一挑战。

方法

我们人工判定了1499份病理记录的子集,以寻找IDHwt胶质瘤的证据以及O6-甲基鸟嘌呤-DNA甲基转移酶(MGMT)启动子的甲基化状态。然后,我们训练了几个正则化逻辑回归模型,这些模型使用文本中识别出的生物医学概念来识别IDH突变和MGMT启动子状态。然后在另一个地点对这些模型进行验证。构建了根据MGMT启动子甲基化状态和其他临床变量对患者进行分层的Kaplan-Meier曲线,以进一步描述队列特征。

结果

识别IDHwt胶质瘤的表现最佳的模型的F1值为0.88。比较MGMT启动子甲基化和未甲基化的患者,中位总生存时间存在显著差异(P<0.001)。最后,表现最佳的IDHwt胶质瘤识别模型在另一个地点实施时的F1值为0.962。

讨论

我们的结果表明,我们可以使用NLP在EHR的病理记录中识别IDHwt胶质瘤患者。我们的模型在二级医疗机构表现出优异的性能,表明它们可以识别多地点的胶质母细胞瘤队列。此外,我们对非甲基化胶质母细胞瘤队列的特征描述概括了已知的生存趋势,证明了EHR数据在临床环境中研究胶质母细胞瘤的实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0853/12284639/a97e7456199e/vdaf111_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验