文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

可共享人工智能从电子健康记录中提取癌症结果,用于精准肿瘤学研究。

Shareable artificial intelligence to extract cancer outcomes from electronic health records for precision oncology research.

机构信息

Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, USA.

Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY, USA.

出版信息

Nat Commun. 2024 Nov 12;15(1):9787. doi: 10.1038/s41467-024-54071-x.


DOI:10.1038/s41467-024-54071-x
PMID:39532885
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11557593/
Abstract

Databases that link molecular data to clinical outcomes can inform precision cancer research into novel prognostic and predictive biomarkers. However, outside of clinical trials, cancer outcomes are typically recorded only in text form within electronic health records (EHRs). Artificial intelligence (AI) models have been trained to extract outcomes from individual EHRs. However, patient privacy restrictions have historically precluded dissemination of these models beyond the centers at which they were trained. In this study, the vulnerability of text classification models trained directly on protected health information to membership inference attacks is confirmed. A teacher-student distillation approach is applied to develop shareable models for annotating outcomes from imaging reports and medical oncologist notes. 'Teacher' models trained on EHR data from Dana-Farber Cancer Institute (DFCI) are used to label imaging reports and discharge summaries from the Medical Information Mart for Intensive Care (MIMIC)-IV dataset. 'Student' models are trained to use these MIMIC documents to predict the labels assigned by teacher models and sent to Memorial Sloan Kettering (MSK) for evaluation. The student models exhibit high discrimination across outcomes in both the DFCI and MSK test sets. Leveraging private labeling of public datasets to distill publishable clinical AI models from academic centers could facilitate deployment of machine learning to accelerate precision oncology research.

摘要

数据库可以将分子数据与临床结果联系起来,为癌症的精准研究提供新的预后和预测生物标志物。然而,在临床试验之外,癌症的结果通常只以电子病历 (EHR) 中的文本形式记录。人工智能 (AI) 模型已经被训练用于从单个 EHR 中提取结果。然而,由于患者隐私的限制,这些模型一直无法在其训练的中心之外传播。在这项研究中,直接在受保护的健康信息上训练的文本分类模型对成员推断攻击的脆弱性得到了证实。采用教师-学生蒸馏方法来开发可共享的模型,用于注释成像报告和肿瘤内科医生笔记中的结果。在 Dana-Farber 癌症研究所 (DFCI) 的 EHR 数据上训练的“教师”模型用于对来自医疗信息集市强化护理 (MIMIC)-IV 数据集的成像报告和出院小结进行标记。“学生”模型被训练用来使用这些 MIMIC 文档来预测教师模型分配的标签,并发送到 Memorial Sloan Kettering(MSK)进行评估。学生模型在 DFCI 和 MSK 测试集中的所有结果中都表现出了很高的辨别力。利用公共数据集的私有标记从学术中心提炼可发布的临床 AI 模型,可以促进机器学习的部署,从而加速精准肿瘤学研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/de70e594e966/41467_2024_54071_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/a2b1fec7d259/41467_2024_54071_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/2b4e6215b301/41467_2024_54071_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/4d0c104ee80a/41467_2024_54071_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/c85068681a8c/41467_2024_54071_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/082efa545078/41467_2024_54071_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/a1c4c0ddf379/41467_2024_54071_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/de70e594e966/41467_2024_54071_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/a2b1fec7d259/41467_2024_54071_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/2b4e6215b301/41467_2024_54071_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/4d0c104ee80a/41467_2024_54071_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/c85068681a8c/41467_2024_54071_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/082efa545078/41467_2024_54071_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/a1c4c0ddf379/41467_2024_54071_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e595/11557593/de70e594e966/41467_2024_54071_Fig7_HTML.jpg

相似文献

[1]
Shareable artificial intelligence to extract cancer outcomes from electronic health records for precision oncology research.

Nat Commun. 2024-11-12

[2]
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024-12-1

[3]
Gaps in Artificial Intelligence Research for Rural Health in the United States: A Scoping Review.

medRxiv. 2025-6-27

[4]
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].

Epidemiol Prev. 2013

[5]
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.

Health Technol Assess. 2006-9

[6]
Examining How Technology Supports Shared Decision-Making in Oncology Consultations: Qualitative Thematic Analysis.

JMIR Cancer. 2025-6-11

[7]
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022-5-20

[8]
Research status, hotspots and perspectives of artificial intelligence applied to pain management: a bibliometric and visual analysis.

Updates Surg. 2025-6-28

[9]
Sexual Harassment and Prevention Training

2025-1

[10]
The educational effects of portfolios on undergraduate student learning: a Best Evidence Medical Education (BEME) systematic review. BEME Guide No. 11.

Med Teach. 2009-4

引用本文的文献

[1]
Empirical evaluation of artificial intelligence distillation techniques for ascertaining cancer outcomes from electronic health records.

NPJ Digit Med. 2025-6-10

[2]
Research trends and hotspots of circulating tumor DNA in colorectal cancer: a bibliometric study.

Front Oncol. 2025-5-14

[3]
New horizons at the interface of artificial intelligence and translational cancer research.

Cancer Cell. 2025-4-14

[4]
The clinical application of artificial intelligence in cancer precision treatment.

J Transl Med. 2025-1-27

本文引用的文献

[1]
Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication.

JAMA Surg. 2024-8-1

[2]
A critical assessment of using ChatGPT for extracting structured data from clinical notes.

NPJ Digit Med. 2024-5-1

[3]
Neural networks memorise personal information from one sample.

Sci Rep. 2023-12-4

[4]
Health system-scale language models are all-purpose prediction engines.

Nature. 2023-7

[5]
Accelerated curation of checkpoint inhibitor-induced colitis cases from electronic health records.

JAMIA Open. 2023-4-1

[6]
MIMIC-IV, a freely accessible electronic health record dataset.

Sci Data. 2023-1-3

[7]
A comparative study of pretrained language models for long clinical text.

J Am Med Inform Assoc. 2023-1-18

[8]
Privacy-Preserving Deep Learning NLP Models for Cancer Registries.

IEEE Trans Emerg Top Comput. 2021

[9]
AACR Project GENIE: 100,000 Cases and Beyond.

Cancer Discov. 2022-9-2

[10]
A Scalable Quality Assurance Process for Curating Oncology Electronic Health Records: The Project GENIE Biopharma Collaborative Approach.

JCO Clin Cancer Inform. 2022-2

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索