Zhou Li, Rueda Maria, Alkhateeb Abedalrhman
School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada.
Department of Chemistry and Biochemistry, University of Windsor, Windsor, ON N9B 3P4, Canada.
Cancers (Basel). 2022 Feb 13;14(4):934. doi: 10.3390/cancers14040934.
The Nottingham Prognostics Index (NPI) is a prognostics measure that predicts operable primary breast cancer survival. The NPI value is calculated based on the size of the tumor, the number of lymph nodes, and the tumor grade. Next-generation sequencing advancements have led to measuring different biological indicators called multi-omics data. The availability of multi-omics data triggered the challenge of integrating and analyzing these various biological measures to understand the progression of the diseases. High-dimensional embedding techniques are incorporated to present the features in the lower dimension, i.e., in a 2-dimensional map. The dataset consists of three -omics: gene expression, copy number alteration (CNA), and mRNA from 1885 female patients. The model creates a gene similarity network (GSN) map for each omic using t-distributed stochastic neighbor embedding (-SNE) before being merged into the residual neural network (ResNet) classification model. The aim of this work was to (i) extract multi-omics biomarkers that are associated with the prognosis and prediction of breast cancer survival; and (ii) build a prediction model for multi-class breast cancer NPI classes. We evaluated this model and compared it to different high-dimensional embedding techniques and neural network combinations. The proposed model outperformed the other methods with an accuracy of 98.48%, and the area under the curve (AUC) equals 0.9999. The findings in the literature confirm associations between some of the extracted omics and breast cancer prognosis and survival including , , , and from the gene expression dataset; , , and from the CNA dataset; and , , and from the mRNA dataset.
诺丁汉预后指数(NPI)是一种预测可手术原发性乳腺癌生存率的预后指标。NPI值是根据肿瘤大小、淋巴结数量和肿瘤分级计算得出的。下一代测序技术的进步使得能够测量不同的生物指标,即多组学数据。多组学数据的可用性引发了整合和分析这些不同生物指标以了解疾病进展的挑战。采用高维嵌入技术在低维度(即二维图)中呈现特征。该数据集由来自1885名女性患者的三种组学数据组成:基因表达、拷贝数变异(CNA)和mRNA。该模型在合并到残差神经网络(ResNet)分类模型之前,使用t分布随机邻域嵌入(t-SNE)为每个组学创建基因相似性网络(GSN)图。这项工作的目的是:(i)提取与乳腺癌生存预后和预测相关的多组学生物标志物;(ii)构建多类乳腺癌NPI类别的预测模型。我们评估了该模型,并将其与不同的高维嵌入技术和神经网络组合进行了比较。所提出的模型以98.48%的准确率优于其他方法,曲线下面积(AUC)等于0.9999。文献中的研究结果证实了一些提取的组学数据与乳腺癌预后和生存之间的关联,包括基因表达数据集中的 、 、 、 和 ;CNA数据集中的 、 、 和 ;以及mRNA数据集中的 、 、 和 。