Suppr超能文献

通过 DNA 甲基化与基因表达的关系来阐明癌症亚型。

Elucidating Cancer Subtypes by Using the Relationship between DNA Methylation and Gene Expression.

机构信息

Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA.

Department of Mathematics, University of Massachusetts Boston, Boston, MA 02125, USA.

出版信息

Genes (Basel). 2024 May 16;15(5):631. doi: 10.3390/genes15050631.

Abstract

Advancements in the field of next generation sequencing (NGS) have generated vast amounts of data for the same set of subjects. The challenge that arises is how to combine and reconcile results from different omics studies, such as epigenome and transcriptome, to improve the classification of disease subtypes. In this study, we introduce sCClust (sparse canonical correlation analysis with clustering), a technique to combine high-dimensional omics data using sparse canonical correlation analysis (sCCA), such that the correlation between datasets is maximized. This stage is followed by clustering the integrated data in a lower-dimensional space. We apply sCClust to gene expression and DNA methylation data for three cancer genomics datasets from the Cancer Genome Atlas (TCGA) to distinguish between underlying subtypes. We evaluate the identified subtypes using Kaplan-Meier plots and hazard ratio analysis on the three types of cancer-GBM (glioblastoma multiform), lung cancer and colon cancer. Comparison with subtypes identified by both single- and multi-omics studies implies improved clinical association. We also perform pathway over-representation analysis in order to identify up-regulated and down-regulated genes as tentative drug targets. The main goal of the paper is twofold: the integration of epigenomic and transcriptomic datasets followed by elucidating subtypes in the latent space. The significance of this study lies in the enhanced categorization of cancer data, which is crucial to precision medicine.

摘要

下一代测序(NGS)领域的进展为同一组研究对象产生了大量的数据。由此产生的挑战是如何结合和协调来自不同组学研究(如表观基因组学和转录组学)的结果,以改善疾病亚型的分类。在这项研究中,我们引入了 sCClust(稀疏正则相关分析与聚类),这是一种使用稀疏正则相关分析(sCCA)来整合高维组学数据的技术,以使数据集之间的相关性最大化。这一阶段之后是在低维空间中对整合的数据进行聚类。我们将 sCClust 应用于来自癌症基因组图谱(TCGA)的三个癌症基因组学数据集的基因表达和 DNA 甲基化数据,以区分潜在的亚型。我们使用 Kaplan-Meier 图和三种癌症(胶质母细胞瘤、肺癌和结肠癌)的风险比分析来评估识别出的亚型。与单组学和多组学研究识别出的亚型进行比较,表明了更好的临床相关性。我们还进行了通路过度表达分析,以确定上调和下调的基因作为候选药物靶点。本文的主要目标有两个:整合表观基因组学和转录组学数据集,然后在潜在空间中阐明亚型。这项研究的意义在于增强了癌症数据的分类,这对精准医学至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/46e5/11121157/dbdc2616f4b4/genes-15-00631-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验