Cui Tongtong, Wang Zeyuan, Gu Hong, Qin Pan, Wang Jia
Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning, China.
Department of Breast Surgery, Second Hospital of Dalian Medical University, Dalian, Liaoning, China.
Front Genet. 2023 Feb 2;14:1095976. doi: 10.3389/fgene.2023.1095976. eCollection 2023.
In the pursuit of precision medicine for cancer, a promising step is to predict drug response based on data mining, which can provide clinical decision support for cancer patients. Although some machine learning methods for predicting drug response from genomic data already exist, most of them focus on point prediction, which cannot reveal the distribution of predicted results. In this paper, we propose a three-layer feature selection combined with a gamma distribution based GLM and a two-layer feature selection combined with an ANN. The two regression methods are applied to the Encyclopedia of Cancer Cell Lines (CCLE) and the Cancer Drug Sensitivity Genomics (GDSC) datasets. Using ten-fold cross-validation, our methods achieve higher accuracy on anticancer drug response prediction compared to existing methods, with an and RMSE of 0.87 and 0.53, respectively. Through data validation, the significance of assessing the reliability of predictions by predicting confidence intervals and its role in personalized medicine are illustrated. The correlation analysis of the genes selected from the three layers of features also shows the effectiveness of our proposed methods.
在追求癌症精准医疗的过程中,一个有前景的步骤是基于数据挖掘来预测药物反应,这可为癌症患者提供临床决策支持。尽管已经存在一些从基因组数据预测药物反应的机器学习方法,但它们大多侧重于点预测,无法揭示预测结果的分布情况。在本文中,我们提出了一种结合基于伽马分布的广义线性模型的三层特征选择方法以及一种结合人工神经网络的两层特征选择方法。这两种回归方法被应用于癌细胞系百科全书(CCLE)和癌症药物敏感性基因组学(GDSC)数据集。使用十折交叉验证,我们的方法在抗癌药物反应预测方面比现有方法具有更高的准确性,其 和均方根误差(RMSE)分别为0.87和0.53。通过数据验证,说明了通过预测置信区间评估预测可靠性的重要性及其在个性化医疗中的作用。从三层特征中选择的基因的相关性分析也表明了我们所提出方法的有效性。