Suppr超能文献

基于Med-BERT的下一次就诊标记预测头推进胰腺癌预测。

Advancing Pancreatic Cancer Prediction with a Next Visit Token Prediction Head on Top of Med-BERT.

作者信息

He Jianping, Rasmy Laila, Zhi Degui, Tao Cui

机构信息

McWilliams School of Biomedical Informatics, UTHealth at Houston, Houston, TX 77030, USA.

Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, USA.

出版信息

Cancers (Basel). 2025 Feb 4;17(3):516. doi: 10.3390/cancers17030516.

Abstract

BACKGROUND

Electronic Health Records (EHRs) encompass valuable data essential for disease prediction. The application of artificial intelligence (AI), particularly deep learning, significantly enhances disease prediction by analyzing extensive EHR datasets to identify hidden patterns, facilitating early detection. Recently, numerous foundation models pretrained on extensive data have demonstrated efficacy in disease prediction using EHRs. However, there remains some unanswered questions on how to best utilize such models, especially with very small fine-tuning cohorts.

METHODS

We utilized Med-BERT, an EHR-specific foundation model, and reformulated the disease binary prediction task into a token prediction task and a next visit mask token prediction task to align with Med-BERT's pretraining task format in order to improve the accuracy of pancreatic cancer (PaCa) prediction in both few-shot and fully supervised settings.

RESULTS

The reformulation of the task into a token prediction task, referred to as Med-BERT-Sum, demonstrated slightly superior performance in both few-shot scenarios and larger data samples. Furthermore, reformulating the prediction task as a Next Visit Mask Token Prediction task (Med-BERT-Mask) significantly outperformed the conventional Binary Classification (BC) prediction task (Med-BERT-BC) by 3% to 7% in few-shot scenarios with data sizes ranging from 10 to 500 samples. These findings highlight that aligning the downstream task with Med-BERT's pretraining objectives substantially enhances the model's predictive capabilities, thereby improving its effectiveness in predicting both rare and common diseases.

CONCLUSIONS

Reformatting disease prediction tasks to align with the pretraining of foundation models enhances prediction accuracy, leading to earlier detection and timely intervention. This approach improves treatment effectiveness, survival rates, and overall patient outcomes for PaCa and potentially other cancers.

摘要

背景

电子健康记录(EHRs)包含疾病预测所需的宝贵数据。人工智能(AI)的应用,尤其是深度学习,通过分析大量电子健康记录数据集以识别隐藏模式,极大地增强了疾病预测能力,有助于早期检测。最近,许多在大量数据上预训练的基础模型已证明在使用电子健康记录进行疾病预测方面具有有效性。然而,关于如何最佳利用此类模型,尤其是在微调队列非常小的情况下,仍存在一些未解决的问题。

方法

我们使用了特定于电子健康记录的基础模型Med-BERT,并将疾病二元预测任务重新表述为标记预测任务和下次就诊掩码标记预测任务,以使其与Med-BERT的预训练任务格式一致,从而提高在少样本和完全监督设置下胰腺癌(PaCa)预测的准确性。

结果

将任务重新表述为标记预测任务(称为Med-BERT-Sum)在少样本场景和更大的数据样本中均表现出略优的性能。此外,在数据量从10到500个样本的少样本场景中,将预测任务重新表述为下次就诊掩码标记预测任务(Med-BERT-Mask)比传统二元分类(BC)预测任务(Med-BERT-BC)显著高出3%至7%。这些发现突出表明,使下游任务与Med-BERT的预训练目标保持一致可大幅增强模型的预测能力,从而提高其在预测罕见病和常见疾病方面的有效性。

结论

重新格式化疾病预测任务以使其与基础模型的预训练保持一致可提高预测准确性,从而实现更早检测和及时干预。这种方法可提高胰腺癌以及潜在其他癌症的治疗效果、生存率和患者总体预后。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a655/11816036/e01d5fe0313a/cancers-17-00516-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验