Suppr超能文献

基于联合核学习模型的多组学数据融合用于癌症亚型发现和关键基因识别

Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification.

作者信息

Feng Jie, Jiang Limin, Li Shuhao, Tang Jijun, Wen Lan

机构信息

School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.

School of Computational Science and Engineering, University of South Carolina, Columbia, SC, United States.

出版信息

Front Genet. 2021 Mar 4;12:647141. doi: 10.3389/fgene.2021.647141. eCollection 2021.

Abstract

The multiple sources of cancer determine its multiple causes, and the same cancer can be composed of many different subtypes. Identification of cancer subtypes is a key part of personalized cancer treatment and provides an important reference for clinical diagnosis and treatment. Some studies have shown that there are significant differences in the genetic and epigenetic profiles among different cancer subtypes during carcinogenesis and development. In this study, we first collect seven cancer datasets from the Broad Institute GDAC Firehose, including gene expression profile, isoform expression profile, DNA methylation expression data, and survival information correspondingly. Furthermore, we employ kernel principal component analysis (PCA) to extract features for each expression profile, convert them into three similarity kernel matrices by Gaussian kernel function, and then fuse these matrices as a global kernel matrix. Finally, we apply it to spectral clustering algorithm to get the clustering results of different cancer subtypes. In the experimental results, besides using the -value from the Cox regression model and survival analysis as the primary evaluation measures, we also introduce statistical indicators such as Rand index (RI) and adjusted RI (ARI) to verify the performance of clustering. Then combining with gene expression profile, we obtain the differential expression of genes among different subtypes by gene set enrichment analysis. For lung cancer, GMPS, EPHA10, C10orf54, and MAGEA6 are highly expressed in different subtypes; for liver cancer, CMYA5, DEPDC6, FAU, VPS24, RCBTB2, LOC100133469, and SLC35B4 are significantly expressed in different subtypes.

摘要

癌症的多种来源决定了其多种病因,并且同一种癌症可能由许多不同的亚型组成。癌症亚型的识别是个性化癌症治疗的关键部分,为临床诊断和治疗提供重要参考。一些研究表明,在癌症发生和发展过程中,不同癌症亚型之间的基因和表观遗传特征存在显著差异。在本研究中,我们首先从布罗德研究所的GDAC Firehose收集了七个癌症数据集,分别包括基因表达谱、异构体表达谱、DNA甲基化表达数据以及相应的生存信息。此外,我们采用核主成分分析(PCA)为每个表达谱提取特征,通过高斯核函数将它们转换为三个相似性核矩阵,然后将这些矩阵融合为一个全局核矩阵。最后,将其应用于谱聚类算法以获得不同癌症亚型的聚类结果。在实验结果中,除了使用Cox回归模型和生存分析的P值作为主要评估指标外,我们还引入了兰德指数(RI)和调整后的RI(ARI)等统计指标来验证聚类性能。然后结合基因表达谱,通过基因集富集分析获得不同亚型之间基因的差异表达。对于肺癌,GMPS、EPHA10、C10orf54和MAGEA6在不同亚型中高表达;对于肝癌,CMYA5、DEPDC6、FAU、VPS24、RCBTB2、LOC100133469和SLC35B4在不同亚型中显著表达。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验