Suppr超能文献

一种用于检测稀疏共表达图中边缘的混合模型及其在比较乳腺癌亚型中的应用。

A mixture model to detect edges in sparse co-expression graphs with an application for comparing breast cancer subtypes.

机构信息

Department of Statistics, University of Connecticut, Storrs, CT, United States of America.

Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States of America.

出版信息

PLoS One. 2021 Feb 11;16(2):e0246945. doi: 10.1371/journal.pone.0246945. eCollection 2021.

Abstract

We develop a method to recover a gene network's structure from co-expression data, measured in terms of normalized Pearson's correlation coefficients between gene pairs. We treat these co-expression measurements as weights in the complete graph in which nodes correspond to genes. To decide which edges exist in the gene network, we fit a three-component mixture model such that the observed weights of 'null edges' follow a normal distribution with mean 0, and the non-null edges follow a mixture of two lognormal distributions, one for positively- and one for negatively-correlated pairs. We show that this so-called L2 N mixture model outperforms other methods in terms of power to detect edges, and it allows to control the false discovery rate. Importantly, our method makes no assumptions about the true network structure. We demonstrate our method, which is implemented in an R package called edgefinder, using a large dataset consisting of expression values of 12,750 genes obtained from 1,616 women. We infer the gene network structure by cancer subtype, and find insightful subtype characteristics. For example, we find thirteen pathways which are enriched in each of the cancer groups but not in the Normal group, with two of the pathways associated with autoimmune diseases and two other with graft rejection. We also find specific characteristics of different breast cancer subtypes. For example, the Luminal A network includes a single, highly connected cluster of genes, which is enriched in the human diseases category, and in the Her2 subtype network we find a distinct, and highly interconnected cluster which is uniquely enriched in drug metabolism pathways.

摘要

我们开发了一种从共表达数据中恢复基因网络结构的方法,这些数据是通过基因对之间的标准化 Pearson 相关系数来衡量的。我们将这些共表达测量值视为完整图中节点对应于基因的边的权重。为了确定基因网络中存在哪些边,我们拟合了一个三组分混合模型,使得观察到的“空边”权重服从均值为 0 的正态分布,而非空边遵循两个对数正态分布的混合,一个用于正相关对,一个用于负相关对。我们表明,这种所谓的 L2N 混合模型在检测边的能力方面优于其他方法,并且它允许控制假发现率。重要的是,我们的方法对真实网络结构没有任何假设。我们使用包含来自 1616 名女性的 12750 个基因表达值的大型数据集来演示我们的方法,该方法实现于一个名为 edgefinder 的 R 包中。我们通过癌症亚型推断基因网络结构,并发现了有洞察力的亚型特征。例如,我们发现了 13 条途径,它们在每个癌症组中都有富集,但在正常组中没有富集,其中两条途径与自身免疫性疾病有关,两条途径与移植物排斥有关。我们还发现了不同乳腺癌亚型的特定特征。例如,Luminal A 网络包含一个单一的、高度连接的基因簇,它在人类疾病类别中丰富,在 Her2 亚型网络中,我们发现了一个独特的、高度相互连接的簇,它在药物代谢途径中唯一富集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a12/7877669/cde47eb2712e/pone.0246945.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验