Suppr超能文献

GMMchi:基于高斯混合模型的基因表达聚类。

GMMchi: gene expression clustering using Gaussian mixture modeling.

机构信息

Cancer and Immunogenetics Laboratory, Weatherall Institute of Molecular Medicine, Department of Oncology, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK.

Harvard Medical School, Boston, MA, 02115, USA.

出版信息

BMC Bioinformatics. 2022 Nov 2;23(1):457. doi: 10.1186/s12859-022-05006-0.

Abstract

BACKGROUND

Cancer evolution consists of a stepwise acquisition of genetic and epigenetic changes, which alter the gene expression profiles of cells in a particular tissue and result in phenotypic alterations acted upon by natural selection. The recurrent appearance of specific genetic lesions across individual cancers and cancer types suggests the existence of certain "driver mutations," which likely make up the major contribution to tumors' selective advantages over surrounding normal tissue and as such are responsible for the most consequential aspects of the cancer cells' gene expression patterns and phenotypes. We hypothesize that such mutations are likely to cluster with specific dichotomous shifts in the expression of the genes they most closely control, and propose GMMchi, a Python package that leverages Gaussian Mixture Modeling to detect and characterize bimodal gene expression patterns across cancer samples, as a tool to analyze such correlations using 2 × 2 contingency table statistics.

RESULTS

Using well-defined simulated data, we were able to confirm the robust performance of GMMchi, reaching 85% accuracy with a sample size of n = 90. We were also able to demonstrate a few examples of the application of GMMchi with respect to its capacity to characterize background florescent signals in microarray data, filter out uninformative background probe sets, as well as uncover novel genetic interrelationships and tumor characteristics. Our approach to analysing gene expression analysis in cancers provides an additional lens to supplement traditional continuous-valued statistical analysis by maximizing the information that can be gathered from bulk gene expression data.

CONCLUSIONS

We confirm that GMMchi robustly and reliably extracts bimodal patterns from both colorectal cancer (CRC) cell line-derived microarray and tumor-derived RNA-Seq data and verify previously reported gene expression correlates of some well-characterized CRC phenotypes.

AVAILABILITY

The Python package GMMchi and our cell line microarray data used in this paper is available for downloading on GitHub at https://github.com/jeffliu6068/GMMchi .

摘要

背景

癌症的进化由遗传和表观遗传变化的逐步积累组成,这些变化改变了特定组织中细胞的基因表达谱,并导致表型改变,这些改变受到自然选择的影响。在个体癌症和癌症类型中反复出现特定的遗传病变表明存在某些“驱动突变”,这些突变可能构成肿瘤相对于周围正常组织的主要选择性优势的主要贡献,因此负责癌症细胞基因表达模式和表型的最关键方面。我们假设这些突变很可能与它们最密切控制的基因表达的特定二分转变聚类,并提出 GMMchi,这是一个利用高斯混合建模来检测和描述癌症样本中双峰基因表达模式的 Python 包,作为一种使用 2×2 列联表统计分析来分析这种相关性的工具。

结果

使用定义明确的模拟数据,我们能够确认 GMMchi 的稳健性能,在 n=90 的样本量下达到 85%的准确率。我们还能够展示 GMMchi 在几个应用示例,包括其在微阵列数据中特征化背景荧光信号的能力、过滤掉无信息的背景探针集,以及发现新的遗传相互关系和肿瘤特征。我们分析癌症中基因表达分析的方法提供了一个额外的视角,通过最大限度地利用可以从批量基因表达数据中收集的信息来补充传统的连续值统计分析。

结论

我们确认 GMMchi 可以从结直肠癌细胞系衍生的微阵列和肿瘤衍生的 RNA-Seq 数据中稳健可靠地提取双峰模式,并验证了先前报道的一些特征明确的结直肠肿瘤表型的基因表达相关性。

可用性

本文中使用的 Python 包 GMMchi 和我们的细胞系微阵列数据可在 GitHub 上下载,网址为 https://github.com/jeffliu6068/GMMchi。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fda7/9632092/c835c9b6d9c4/12859_2022_5006_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验