Department of Hospital Pathology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, South Korea.
PLoS One. 2020 Nov 9;15(11):e0241514. doi: 10.1371/journal.pone.0241514. eCollection 2020.
Breast cancers with PIK3CA mutations can be treated with PIK3CA inhibitors in hormone receptor-positive HER2 negative subtypes. We applied a supervised elastic net penalized logistic regression model to predict PIK3CA mutations from gene expression data. This regression approach was applied to predict modeling using the TCGA pan-cancer dataset. Approximately 10,000 cases were available for PIK3CA mutation and mRNA expression data. In 10-fold cross-validation, the model with λ = 0.01 and α = 1.0 (ridge regression) showed the best performance, in terms of area under the receiver operating characteristic (AUROC). The final model was developed with selected hyper-parameters using the entire training set. The training set AUROC was 0.93, and the test set AUROC was 0.84. The area under the precision-recall (AUPR) of the training set was 0.66, and the test set AUPR was 0.39. Cancer types were the most important predictors. Both insulin like growth factor 1 receptor (IGF1R) and the phosphatase and tensin homolog (PTEN) were the most significant genes in gene expression predictors. Our study suggests that predicting genomic alterations using gene expression data is possible, with good outcomes.
具有 PIK3CA 突变的乳腺癌可以用 PIK3CA 抑制剂治疗激素受体阳性 HER2 阴性亚型。我们应用了一种有监督的弹性网络惩罚逻辑回归模型,从基因表达数据中预测 PIK3CA 突变。该回归方法应用于 TCGA 泛癌数据集的预测建模。大约有 10000 例病例可用于 PIK3CA 突变和 mRNA 表达数据。在 10 倍交叉验证中,λ=0.01 和 α=1.0(岭回归)的模型在接收者操作特征(AUROC)方面表现出最佳性能。最终模型使用整个训练集选择超参数进行开发。训练集 AUROC 为 0.93,测试集 AUROC 为 0.84。训练集的精度-召回率(AUPR)为 0.66,测试集 AUPR 为 0.39。癌症类型是最重要的预测因素。胰岛素样生长因子 1 受体(IGF1R)和磷酸酶和张力蛋白同源物(PTEN)都是基因表达预测中最重要的基因。我们的研究表明,使用基因表达数据预测基因组改变是可能的,并且结果良好。