Suppr超能文献

从临床自由文本中归纳机器学习模型。

Generalizing machine learning models from clinical free text.

作者信息

Pandian Balaji, Vandervest John, Mentz Graciela, Varghese Jomy, Steadman Shavano D, Kheterpal Sachin, Makar Maggie, Vydiswaran V G Vinod, Burns Michael L

机构信息

Department of Anesthesiology, Weill Cornell Medicine, New York, NY, USA.

Department of Anesthesiology, University of Michigan, 1500 East Medical Center Drive, 1H247 UH, SPC 5048, Ann Arbor, MI, 48109-5048, USA.

出版信息

Sci Rep. 2025 Aug 28;15(1):31668. doi: 10.1038/s41598-025-17197-6.

Abstract

To assess strategies for enhancing the generalizability of healthcare artificial intelligence models, we analyzed the impact of preprocessing approaches applied to medical free text, compared single- versus multiple-institution data models, and evaluated data divergence metrics. From 1,607,393 procedures across 44 U.S. institutions, deep neural network models were created to classify anesthesiology Current Procedural Terminology codes from medical free text. Three levels of text preprocessing were analyzed from minimal to automated (cSpell) with comprehensive physician review. Kullback-Leibler Divergence and k-medoid clustering were used to predict single- vs multiple-institutional model performances. Single-institution models showed a mean accuracy of 92.5% [2.8% SD] and 0.923 [0.029] F1 on internal data but generalized poorly on external data (- 22.4% [7.0%]; - 0.223 [0.081]). Free text preprocessing minimally altered performance (+ 0.51% [2.23]; + 0.004 [0.020]). An all-institution model performed worse on internal data (-4.88% [2.43%]; - 0.045 [0.020]), but improved generalizability to external data (+ 17.1% [8.7%]; + 0.182 [0.073]). Compared to vocabulary overlap and Jaccard similarity, Kullback-Leibler Divergence correlated with model performance (R of 0.41 vs 0.16 vs 0.08, respectively) and was successful clustering institutions and identifying outlier data. Overall, pre-processing medical free text showed limited utility improving generalization of machine learning models, single institution models performed best but generalized poorly, while combined data models improved generalization but never achieved performance of single-institutional models. Kullback-Leibler Divergence provided valuable insight as a reliable heuristic to evaluate generalizability. These results have important implications in developing broad use artificial intelligence healthcare applications, providing valuable insight into their development and evaluations.

摘要

为评估提高医疗人工智能模型通用性的策略,我们分析了应用于医学自由文本的预处理方法的影响,比较了单机构与多机构数据模型,并评估了数据差异指标。从美国44家机构的1607393例手术中,创建了深度神经网络模型,以根据医学自由文本对麻醉学当前程序术语代码进行分类。分析了从最小化到自动化(cSpell)并经过全面医生审查的三个文本预处理级别。使用库尔贝克-莱布勒散度和k-中心点聚类来预测单机构与多机构模型的性能。单机构模型在内部数据上的平均准确率为92.5%[标准差2.8%],F1值为0.923[0.029],但在外部数据上的泛化能力较差(-22.4%[7.0%];-0.223[0.081])。自由文本预处理对性能的改变极小(+0.51%[2.23];+0.004[0.020])。一个全机构模型在内部数据上的表现更差(-4.88%[2.43%];-0.045[0.020]),但对外部数据的泛化能力有所提高(+17.1%[8.7%];+0.182[0.073])。与词汇重叠和杰卡德相似度相比,库尔贝克-莱布勒散度与模型性能相关(相关系数分别为0.41、0.16和0.08),并且成功地对机构进行了聚类并识别出异常数据。总体而言,预处理医学自由文本在提高机器学习模型的通用性方面效用有限,单机构模型表现最佳但泛化能力较差,而组合数据模型提高了泛化能力,但从未达到单机构模型的性能。库尔贝克-莱布勒散度作为一种评估通用性的可靠启发式方法提供了有价值的见解。这些结果对开发广泛应用的人工智能医疗保健应用具有重要意义,为其开发和评估提供了有价值的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e316/12391454/c827443517e3/41598_2025_17197_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验