Department of Computer Science and Technology, Tsinghua University, Beijing, China.
School of Software, Shandong University, Jinan, China.
Sci Rep. 2023 Aug 3;13(1):12595. doi: 10.1038/s41598-023-39543-2.
Machine learning (ML) has been extensively involved in assistant disease diagnosis and prediction systems to emancipate the serious dependence on medical resources and improve healthcare quality. Moreover, with the booming of pre-training language models (PLMs), the application prospect and promotion potential of machine learning methods in the relevant field have been further inspired. PLMs have recently achieved tremendous success in diverse text processing tasks, whereas limited by the significant semantic gap between the pre-training corpus and the structured electronic health records (EHRs), PLMs cannot converge to anticipated disease diagnosis and prediction results. Unfortunately, establishing connections between PLMs and EHRs typically requires the extraction of curated predictor variables from structured EHR resources, which is tedious and labor-intensive, and even discards vast implicit information.In this work, we propose an Input Prompting and Discriminative language model with the Mixture-of-experts framework (IPDM) by promoting the model's capabilities to learn knowledge from heterogeneous information and facilitating the feature-aware ability of the model. Furthermore, leveraging the prompt-tuning mechanism, IPDM can inherit the impacts of the pre-training in downstream tasks exclusively through minor modifications. IPDM remarkably outperforms existing models, proved by experiments on one disease diagnosis task and two disease prediction tasks. Finally, experiments with few-feature and few-sample demonstrate that IPDM achieves significant stability and impressive performance in predicting chronic diseases with unclear early-onset characteristics or sudden diseases with insufficient data, which verifies the superiority of IPDM over existing mainstream methods, and reveals the IPDM can powerfully address the aforementioned challenges via establishing a stable and low-resource medical diagnostic system for various clinical scenarios.
机器学习(ML)已广泛应用于辅助疾病诊断和预测系统,以摆脱对医疗资源的严重依赖,提高医疗质量。此外,随着预训练语言模型(PLMs)的蓬勃发展,机器学习方法在相关领域的应用前景和推广潜力得到了进一步激发。PLMs 在各种文本处理任务中最近取得了巨大的成功,但是由于预训练语料库和结构化电子健康记录(EHRs)之间存在显著的语义差距,PLMs 无法收敛到预期的疾病诊断和预测结果。不幸的是,在 PLMs 和 EHRs 之间建立联系通常需要从结构化的 EHR 资源中提取经过精心整理的预测变量,这既繁琐又费力,甚至还会丢弃大量隐含信息。在这项工作中,我们通过促进模型从异构信息中学习知识的能力并促进模型的特征感知能力,提出了一种具有混合专家框架的输入提示和判别语言模型(IPDM)。此外,利用提示调整机制,IPDM 可以通过微小的修改专门从下游任务的预训练中继承影响。IPDM 在一个疾病诊断任务和两个疾病预测任务上的实验结果表明,它明显优于现有模型。最后,在特征少、样本少的实验中,IPDM 在预测具有不明确早期特征的慢性病或数据不足的突发性疾病方面表现出显著的稳定性和令人印象深刻的性能,这验证了 IPDM 优于现有主流方法的优越性,并揭示了 IPDM 通过为各种临床场景建立稳定的低资源医疗诊断系统,能够有力地解决上述挑战。