Zhu Weiwei, Jiang Xiaodong, Zhang Lei, Zhou Peng, Xie Xinping, Wang Hongqiang
University of Science and Technology of China, Hefei, Anhui, 230026, China.
Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, 230031, China.
J Transl Med. 2025 Aug 5;23(1):864. doi: 10.1186/s12967-025-06795-7.
BACKGROUND: Due to the complexity of tumor genetic heterogeneity, personalized medicine has progressively emerged as the central focus of cancer research. However, how to accurately predict the drug response of patients before receiving treatment is the critical challenge to the development of this field. METHODS: This paper proposes DrugBERT, a BERT-based framework integrated with LDA topic embedding and a drug efficacy-aware mechanism for predicting the efficacy of antitumor drugs. The method incorporates LDA-generated topic embedding as a semantic enhancement module into the BERT language model and introduces a drug efficacy-aware attention mechanism to prioritize drug efficacy-related semantic features. The model is via LSTM to capture long-range dependencies in clinical text data. In addition, the SMOTE algorithm is used to synthesize samples of the minority class to solve the problem of data imbalance. RESULTS: The proposed method DrugBERT demonstrated remarkable performance on a dataset of 958 patients with non-small cell cancer treated with antitumor drugs. Furthermore, when validated on an independent dataset of 266 bowel cancer patients, the model achieved a 3% improvement in AUC over previous methods, signifying its robust generalization capability. CONCLUSIONS: DrugBERT can help predict the efficacy of antitumor drugs based on clinical text while exhibiting strong generalization capability. These findings highlight its potential for optimizing personalized therapeutic strategies through language model.
背景:由于肿瘤基因异质性的复杂性,个性化医疗已逐渐成为癌症研究的核心焦点。然而,如何在患者接受治疗前准确预测其药物反应是该领域发展的关键挑战。 方法:本文提出了DrugBERT,这是一个基于BERT的框架,集成了LDA主题嵌入和药物疗效感知机制,用于预测抗肿瘤药物的疗效。该方法将LDA生成的主题嵌入作为语义增强模块纳入BERT语言模型,并引入药物疗效感知注意力机制,以优先处理与药物疗效相关的语义特征。该模型通过LSTM来捕捉临床文本数据中的长程依赖关系。此外,使用SMOTE算法合成少数类样本以解决数据不平衡问题。 结果:所提出的DrugBERT方法在958例接受抗肿瘤药物治疗的非小细胞癌患者数据集上表现出显著性能。此外,在266例肠癌患者的独立数据集上进行验证时,该模型的AUC比以前的方法提高了3%,表明其具有强大的泛化能力。 结论:DrugBERT可以基于临床文本帮助预测抗肿瘤药物的疗效,同时表现出强大的泛化能力。这些发现突出了其通过语言模型优化个性化治疗策略的潜力。
2025-1
Clin Orthop Relat Res. 2024-9-1
Arch Ital Urol Androl. 2025-6-30
JBI Database System Rev Implement Rep. 2015-6-12
BMC Med Inform Decis Mak. 2025-3-7
IEEE J Biomed Health Inform. 2024-11-6
Digit Health. 2024-9-9
Sensors (Basel). 2024-7-26
Heliyon. 2024-2-16