Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab 147001, India.
IET Syst Biol. 2020 Jun;14(3):160-169. doi: 10.1049/iet-syb.2019.0087.
Breast cancer is the second leading cause of death in the world. Breast cancer research is focused towards its early prediction, diagnosis, and prognosis. Breast cancer can be predicted on omics profiles, clinical tests, and pathological images. The omics profiles comprise of genomic, proteomic, and transcriptomic profiles that are available as high-dimensional datasets. Survival prediction is carried out on omics data to predict early the onset of disease, relapse, reoccurrence of diseases, and biomarker identification. The early prediction of breast cancer is desired for the effective treatment of patients as delay can aggravate the staging of cancer. In this study, extreme learning machine (ELM) based model for breast cancer survival prediction named eBreCaP is proposed. It integrates the genomic (gene expression, copy number alteration, DNA methylation, protein expression) and pathological image datasets; and trains them using an ensemble of ELM with the six best-chosen models suitable to be applied on integrated data. eBreCaP has been evaluated on nine performance parameters, namely sensitivity, specificity, precision, accuracy, Matthews correlation coefficient, area under curve, area under precision-recall, hazard ratio, and concordance Index. eBreCaP has achieved an accuracy of 85% for early breast cancer survival prediction using the ensemble of ELM with gradient boosting.
乳腺癌是全球第二大死亡原因。乳腺癌的研究重点是早期预测、诊断和预后。乳腺癌可以通过组学谱、临床检测和病理图像进行预测。组学谱包括基因组、蛋白质组和转录组谱,这些谱都是高维数据集。在组学数据上进行生存预测,以早期预测疾病的发作、复发、疾病的再发以及生物标志物的识别。早期预测乳腺癌是为了有效治疗患者,因为延迟会加重癌症的分期。在这项研究中,提出了一种基于极端学习机(ELM)的乳腺癌生存预测模型 eBreCaP。它集成了基因组(基因表达、拷贝数改变、DNA 甲基化、蛋白质表达)和病理图像数据集,并使用适合应用于集成数据的六个最佳选择模型的 ELM 集成进行训练。eBreCaP 使用梯度提升的 ELM 集成在九个性能参数上进行了评估,即敏感性、特异性、精度、准确性、马修斯相关系数、曲线下面积、精度-召回曲线下面积、风险比和一致性指数。eBreCaP 实现了使用梯度提升的 ELM 集成对早期乳腺癌生存预测的 85%的准确率。