Suppr超能文献

用于从电子健康记录中确定癌症预后的人工智能提炼技术的实证评估。

Empirical evaluation of artificial intelligence distillation techniques for ascertaining cancer outcomes from electronic health records.

作者信息

Riaz Irbaz Bin, Naqvi Syed Arsalan Ahmed, Ashraf Noman, Harris Gordon J, Kehl Kenneth L

机构信息

Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.

Mayo Clinic, Phoenix, AZ, USA.

出版信息

NPJ Digit Med. 2025 Jun 10;8(1):347. doi: 10.1038/s41746-025-01646-7.

Abstract

Phenotypic information for cancer research is embedded in unstructured electronic health records (EHR), requiring effort to extract. Deep learning models can automate this but face scalability issues due to privacy concerns. We evaluated techniques for applying a teacher-student framework to extract longitudinal clinical outcomes from EHRs. We focused on the challenging task of ascertaining two cancer outcomes-overall response and progression according to Response Evaluation Criteria in Solid Tumors (RECIST)-from free-text radiology reports. Teacher models with hierarchical Transformer architecture were trained on data from Dana-Farber Cancer Institute (DFCI). These models labeled public datasets (MIMIC-IV, Wiki-text) and GPT-4-generated synthetic data. "Student" models were then trained to mimic the teachers' predictions. DFCI "teacher" models achieved high performance, and student models trained on MIMIC-IV data showed comparable results, demonstrating effective knowledge transfer. However, student models trained on Wiki-text and synthetic data performed worse, emphasizing the need for in-domain public datasets for model distillation.

摘要

癌症研究的表型信息嵌入在非结构化电子健康记录(EHR)中,需要费力提取。深度学习模型可以实现这一过程的自动化,但由于隐私问题面临可扩展性问题。我们评估了应用师生框架从电子健康记录中提取纵向临床结果的技术。我们专注于一项具有挑战性的任务,即根据实体瘤疗效评价标准(RECIST)从自由文本放射学报告中确定两种癌症结果——总体反应和进展。具有分层Transformer架构的教师模型在来自达纳-法伯癌症研究所(DFCI)的数据上进行训练。这些模型标记了公共数据集(MIMIC-IV、Wiki-text)和GPT-4生成的合成数据。然后训练“学生”模型来模仿教师的预测。DFCI“教师”模型取得了高性能,在MIMIC-IV数据上训练的学生模型显示出可比的结果,证明了有效的知识转移。然而,在Wiki-text和合成数据上训练的学生模型表现较差,这强调了模型提炼需要领域内公共数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/050e/12152177/e6c54eedbcb4/41746_2025_1646_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验