• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多机构自然语言处理管道从电子健康记录中提取表现状态。

A Multi-Institutional Natural Language Processing Pipeline to Extract Performance Status From Electronic Health Records.

机构信息

Center for Innovations in Quality, Effectiveness, and Safety, Michael E. DeBakey VA Medical Center, Houston, TX, USA.

Department of Medicine, Baylor College of Medicine, Houston, TX, USA.

出版信息

Cancer Control. 2024 Jan-Dec;31:10732748241279518. doi: 10.1177/10732748241279518.

DOI:10.1177/10732748241279518
PMID:39222957
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11369884/
Abstract

PURPOSE

Performance status (PS), an essential indicator of patients' functional abilities, is often documented in clinical notes of patients with cancer. The use of natural language processing (NLP) in extracting PS from electronic medical records (EMRs) has shown promise in enhancing clinical decision-making, patient monitoring, and research studies. We designed and validated a multi-institute NLP pipeline to automatically extract performance status from free-text patient notes.

PATIENTS AND METHODS

We collected data from 19,481 patients in Harris Health System (HHS) and 333,862 patients from veteran affair's corporate data warehouse (VA-CDW) and randomly selected 400 patients from each data source to train and validate (50%) and test (50%) the proposed pipeline. We designed an NLP pipeline using an expert-derived rule-based approach in conjunction with extensive post-processing to solidify its proficiency. To demonstrate the pipeline's application, we tested the compliance of PS documentation suggested by the American Society of Clinical Oncology (ASCO) Quality Metric and investigated the potential disparity in PS reporting for stage IV non-small cell lung cancer (NSCLC). We used a logistic regression test, considering patients in terms of race/ethnicity, conversing language, marital status, and gender.

RESULTS

The test results on the HHS cohort showed 92% accuracy, and on VA data demonstrated 98.5% accuracy. For stage IV NSCLC patients, the proposed pipeline achieved an accuracy of 98.5%. Furthermore, our analysis revealed a documentation rate of over 85% for PS among NSCLC patients, surpassing the ASCO Quality Metrics. No disparities were observed in the documentation of PS.

CONCLUSION

Our proposed NLP pipeline shows promising results in extracting PS from free-text notes from various health institutions. It may be used in longitudinal cancer data registries.

摘要

目的

体能状态(PS)是评估患者功能能力的重要指标,通常在癌症患者的临床病历中记录。自然语言处理(NLP)在从电子病历(EMR)中提取 PS 方面的应用显示出了在增强临床决策、患者监测和研究方面的潜力。我们设计并验证了一个多机构的 NLP 管道,以自动从患者病历的自由文本中提取体能状态。

患者和方法

我们从哈里斯健康系统(HHS)中收集了 19481 名患者的数据,从退伍军人事务公司数据仓库(VA-CDW)中收集了 333862 名患者的数据,并从每个数据源中随机选择 400 名患者用于训练和验证(50%)和测试(50%)所提出的管道。我们使用了一种专家推导的基于规则的方法设计了一个 NLP 管道,并结合了广泛的后处理来增强其专业性。为了展示该管道的应用,我们测试了美国临床肿瘤学会(ASCO)质量指标建议的 PS 文档的一致性,并调查了 IV 期非小细胞肺癌(NSCLC)患者 PS 报告中的潜在差异。我们使用逻辑回归测试,考虑了患者的种族/民族、交谈语言、婚姻状况和性别。

结果

在 HHS 队列中的测试结果显示准确率为 92%,在 VA 数据中的准确率为 98.5%。对于 IV 期 NSCLC 患者,该管道的准确率为 98.5%。此外,我们的分析显示,NSCLC 患者的 PS 记录率超过 85%,超过了 ASCO 质量指标。在 PS 的记录方面没有发现差异。

结论

我们提出的 NLP 管道在从各种医疗机构的自由文本记录中提取 PS 方面显示出了有前景的结果。它可用于纵向癌症数据登记处。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/11369884/4615b048c9e7/10.1177_10732748241279518-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/11369884/3eeaab0bfa4f/10.1177_10732748241279518-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/11369884/4615b048c9e7/10.1177_10732748241279518-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/11369884/3eeaab0bfa4f/10.1177_10732748241279518-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cf3/11369884/4615b048c9e7/10.1177_10732748241279518-fig2.jpg

相似文献

1
A Multi-Institutional Natural Language Processing Pipeline to Extract Performance Status From Electronic Health Records.多机构自然语言处理管道从电子健康记录中提取表现状态。
Cancer Control. 2024 Jan-Dec;31:10732748241279518. doi: 10.1177/10732748241279518.
2
Development of a generalizable natural language processing pipeline to extract physician-reported pain from clinical reports: Generated using publicly-available datasets and tested on institutional clinical reports for cancer patients with bone metastases.开发一种可推广的自然语言处理管道,从临床报告中提取医生报告的疼痛:使用公开可用的数据集生成,并在患有骨转移的癌症患者的机构临床报告上进行测试。
J Biomed Inform. 2021 Aug;120:103864. doi: 10.1016/j.jbi.2021.103864. Epub 2021 Jul 12.
3
Extraction and Imputation of Eastern Cooperative Oncology Group Performance Status From Unstructured Oncology Notes Using Language Models.使用语言模型从非结构化肿瘤学记录中提取和插补东部肿瘤协作组表现状态。
JCO Clin Cancer Inform. 2024 May;8:e2300269. doi: 10.1200/CCI.23.00269.
4
Validation of Non-Small Cell Lung Cancer Clinical Insights Using a Generalized Oncology Natural Language Processing Model.利用通用肿瘤自然语言处理模型验证非小细胞肺癌临床见解。
JCO Clin Cancer Inform. 2024 Sep;8:e2300099. doi: 10.1200/CCI.23.00099.
5
Facilitating clinical research through automation: Combining optical character recognition with natural language processing.通过自动化促进临床研究:结合光学字符识别和自然语言处理。
Clin Trials. 2022 Oct;19(5):504-511. doi: 10.1177/17407745221093621. Epub 2022 May 24.
6
Is it possible to automatically assess pretreatment digital rectal examination documentation using natural language processing? A single-centre retrospective study.是否可以使用自然语言处理自动评估直肠指检前的文档记录?一项单中心回顾性研究。
BMJ Open. 2019 Jul 18;9(7):e027182. doi: 10.1136/bmjopen-2018-027182.
7
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道,使用自然语言处理提取和标准化非结构化临床数据。
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.
8
Identification of patients' smoking status using an explainable AI approach: a Danish electronic health records case study.利用可解释 AI 方法识别患者的吸烟状况:丹麦电子健康记录案例研究。
BMC Med Res Methodol. 2024 May 17;24(1):114. doi: 10.1186/s12874-024-02231-4.
9
Challenges of Developing a Natural Language Processing Method With Electronic Health Records to Identify Persons With Chronic Mobility Disability.开发一种使用电子健康记录识别慢性移动障碍患者的自然语言处理方法所面临的挑战。
Arch Phys Med Rehabil. 2020 Oct;101(10):1739-1746. doi: 10.1016/j.apmr.2020.04.024. Epub 2020 May 21.
10
Identification of Preanesthetic History Elements by a Natural Language Processing Engine.基于自然语言处理引擎识别麻醉前病史元素。
Anesth Analg. 2022 Dec 1;135(6):1162-1171. doi: 10.1213/ANE.0000000000006152. Epub 2022 Jul 15.

引用本文的文献

1
Incidence and risk of arterial thromboembolism in cancer patients from a safety-net healthcare system.来自安全网医疗系统的癌症患者发生动脉血栓栓塞的发生率和风险。
J Thromb Haemost. 2025 May;23(5):1539-1550. doi: 10.1016/j.jtha.2025.01.007. Epub 2025 Feb 3.

本文引用的文献

1
Derivation and Validation of a Clinical Risk Assessment Model for Cancer-Associated Thrombosis in Two Unique US Health Care Systems.在两个独特的美国医疗保健系统中,癌症相关血栓形成的临床风险评估模型的推导和验证。
J Clin Oncol. 2023 Jun 1;41(16):2926-2938. doi: 10.1200/JCO.22.01542. Epub 2023 Jan 10.
2
The use of natural language processing in palliative care research: A scoping review.自然语言处理在姑息治疗研究中的应用:一项范围综述。
Palliat Med. 2023 Feb;37(2):275-290. doi: 10.1177/02692163221141969. Epub 2022 Dec 10.
3
Performance Status and Long-Term Outcomes in Cancer-Associated Pulmonary Embolism: Insights From the Hokusai-VTE Cancer Study.
癌症相关性肺栓塞的体能状态与长期预后:来自北陆血管栓塞癌症研究的见解
JACC CardioOncol. 2022 Nov 15;4(4):507-518. doi: 10.1016/j.jaccao.2022.07.008. eCollection 2022 Nov.
4
Natural Language Processing in Pathology: Current Trends and Future Insights.病理学中的自然语言处理:当前趋势与未来展望
Am J Pathol. 2022 Nov;192(11):1486-1495. doi: 10.1016/j.ajpath.2022.07.012. Epub 2022 Aug 17.
5
Performance status and survival in cancer patients undergoing palliative care: retrospective study.接受姑息治疗的癌症患者的体能状态与生存情况:回顾性研究
BMJ Support Palliat Care. 2022 Aug 10. doi: 10.1136/spcare-2022-003562.
6
Developing and optimizing a computable phenotype for incident venous thromboembolism in a longitudinal cohort of patients with cancer.在癌症患者纵向队列中开发并优化用于新发静脉血栓栓塞的可计算表型。
Res Pract Thromb Haemost. 2022 May 25;6(4):e12733. doi: 10.1002/rth2.12733. eCollection 2022 May.
7
Automatic Classification of Cancer Pathology Reports: A Systematic Review.癌症病理报告的自动分类:一项系统综述。
J Pathol Inform. 2022 Jan 20;13:100003. doi: 10.1016/j.jpi.2022.100003. eCollection 2022.
8
Epidemiology of Connectional Silence in specialist serious illness conversations.专科重症疾病会诊中关联性沉默的流行病学
Patient Educ Couns. 2022 Jul;105(7):2005-2011. doi: 10.1016/j.pec.2021.10.032. Epub 2021 Nov 6.
9
Radiotherapy for glioblastoma patients with poor performance status.对体能状况不佳的胶质母细胞瘤患者进行放疗。
J Cancer Res Clin Oncol. 2022 Aug;148(8):2127-2136. doi: 10.1007/s00432-021-03770-9. Epub 2021 Aug 26.
10
Conversational stories & self organizing maps: Innovations for the scalable study of uncertainty in healthcare communication.对话式故事和自组织映射:医疗保健沟通中不确定性可扩展研究的创新。
Patient Educ Couns. 2021 Nov;104(11):2616-2621. doi: 10.1016/j.pec.2021.07.043. Epub 2021 Jul 29.