一种用于检测稀疏共表达图中边缘的混合模型及其在比较乳腺癌亚型中的应用。

A mixture model to detect edges in sparse co-expression graphs with an application for comparing breast cancer subtypes.

机构信息

Department of Statistics, University of Connecticut, Storrs, CT, United States of America.

Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States of America.

出版信息

PLoS One. 2021 Feb 11;16(2):e0246945. doi: 10.1371/journal.pone.0246945. eCollection 2021.

DOI:10.1371/journal.pone.0246945

PMID:33571253

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7877669/

Abstract

We develop a method to recover a gene network's structure from co-expression data, measured in terms of normalized Pearson's correlation coefficients between gene pairs. We treat these co-expression measurements as weights in the complete graph in which nodes correspond to genes. To decide which edges exist in the gene network, we fit a three-component mixture model such that the observed weights of 'null edges' follow a normal distribution with mean 0, and the non-null edges follow a mixture of two lognormal distributions, one for positively- and one for negatively-correlated pairs. We show that this so-called L2 N mixture model outperforms other methods in terms of power to detect edges, and it allows to control the false discovery rate. Importantly, our method makes no assumptions about the true network structure. We demonstrate our method, which is implemented in an R package called edgefinder, using a large dataset consisting of expression values of 12,750 genes obtained from 1,616 women. We infer the gene network structure by cancer subtype, and find insightful subtype characteristics. For example, we find thirteen pathways which are enriched in each of the cancer groups but not in the Normal group, with two of the pathways associated with autoimmune diseases and two other with graft rejection. We also find specific characteristics of different breast cancer subtypes. For example, the Luminal A network includes a single, highly connected cluster of genes, which is enriched in the human diseases category, and in the Her2 subtype network we find a distinct, and highly interconnected cluster which is uniquely enriched in drug metabolism pathways.

摘要

我们开发了一种从共表达数据中恢复基因网络结构的方法，这些数据是通过基因对之间的标准化 Pearson 相关系数来衡量的。我们将这些共表达测量值视为完整图中节点对应于基因的边的权重。为了确定基因网络中存在哪些边，我们拟合了一个三组分混合模型，使得观察到的“空边”权重服从均值为 0 的正态分布，而非空边遵循两个对数正态分布的混合，一个用于正相关对，一个用于负相关对。我们表明，这种所谓的 L2N 混合模型在检测边的能力方面优于其他方法，并且它允许控制假发现率。重要的是，我们的方法对真实网络结构没有任何假设。我们使用包含来自 1616 名女性的 12750 个基因表达值的大型数据集来演示我们的方法，该方法实现于一个名为 edgefinder 的 R 包中。我们通过癌症亚型推断基因网络结构，并发现了有洞察力的亚型特征。例如，我们发现了 13 条途径，它们在每个癌症组中都有富集，但在正常组中没有富集，其中两条途径与自身免疫性疾病有关，两条途径与移植物排斥有关。我们还发现了不同乳腺癌亚型的特定特征。例如，Luminal A 网络包含一个单一的、高度连接的基因簇，它在人类疾病类别中丰富，在 Her2 亚型网络中，我们发现了一个独特的、高度相互连接的簇，它在药物代谢途径中唯一富集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a12/7877669/cde47eb2712e/pone.0246945.g001.jpg

相似文献

A mixture model to detect edges in sparse co-expression graphs with an application for comparing breast cancer subtypes.

PLoS One. 2021 Feb 11;16(2):e0246945. doi: 10.1371/journal.pone.0246945. eCollection 2021.

k-core genes underpin structural features of breast cancer.

Sci Rep. 2021 Aug 11;11(1):16284. doi: 10.1038/s41598-021-95313-y.

A three-gene model to robustly identify breast cancer molecular subtypes.

J Natl Cancer Inst. 2012 Feb 22;104(4):311-25. doi: 10.1093/jnci/djr545. Epub 2012 Jan 18.

NRF1 motif sequence-enriched genes involved in ER/PR -ve HER2 +ve breast cancer signaling pathways.

Breast Cancer Res Treat. 2018 Nov;172(2):469-485. doi: 10.1007/s10549-018-4905-9. Epub 2018 Aug 20.

Investigation of genes and pathways involved in breast cancer subtypes through gene expression meta-analysis.

Gene. 2022 May 5;821:146328. doi: 10.1016/j.gene.2022.146328. Epub 2022 Feb 16.

A network-based, integrative study to identify core biological pathways that drive breast cancer clinical subtypes.

Br J Cancer. 2012 Mar 13;106(6):1107-16. doi: 10.1038/bjc.2011.584. Epub 2012 Feb 16.

Identification of breast cancer prognostic modules via differential module selection based on weighted gene Co-expression network analysis.

Biosystems. 2021 Jan;199:104317. doi: 10.1016/j.biosystems.2020.104317. Epub 2020 Dec 3.

Intrinsic subtypes from PAM50 gene expression assay in a population-based breast cancer cohort: differences by age, race, and tumor characteristics.

Cancer Epidemiol Biomarkers Prev. 2014 May;23(5):714-24. doi: 10.1158/1055-9965.EPI-13-1023. Epub 2014 Feb 12.

Identifying Cancer Subtypes Using a Residual Graph Convolution Model on a Sample Similarity Network.

Genes (Basel). 2021 Dec 27;13(1):65. doi: 10.3390/genes13010065.

Association of high obesity with PAM50 breast cancer intrinsic subtypes and gene expression.

BMC Cancer. 2015 Apr 14;15:278. doi: 10.1186/s12885-015-1263-4.

引用本文的文献

On Graphical Models and Convex Geometry.

Comput Stat Data Anal. 2023 Nov;187. doi: 10.1016/j.csda.2023.107800. Epub 2023 Jun 14.

本文引用的文献

Tetraspanin CD53: an overlooked regulator of immune cell function.

Med Microbiol Immunol. 2020 Aug;209(4):545-552. doi: 10.1007/s00430-020-00677-z. Epub 2020 May 21.

The Tetraspanin CD53 Regulates Early B Cell Development by Promoting IL-7R Signaling.

J Immunol. 2020 Jan 1;204(1):58-67. doi: 10.4049/jimmunol.1900539. Epub 2019 Nov 20.

A data-driven interactome of synergistic genes improves network-based cancer outcome prediction.

PLoS Comput Biol. 2019 Feb 6;15(2):e1006657. doi: 10.1371/journal.pcbi.1006657. eCollection 2019 Feb.

TESTING HIGH-DIMENSIONAL COVARIANCE MATRICES, WITH APPLICATION TO DETECTING SCHIZOPHRENIA RISK GENES.

Ann Appl Stat. 2017 Sep;11(3):1810-1831. doi: 10.1214/17-AOAS1062. Epub 2017 Oct 5.

Cytochrome P450 3A4 and CYP3A5-Catalyzed Bioactivation of Lapatinib.

Drug Metab Dispos. 2016 Oct;44(10):1584-97. doi: 10.1124/dmd.116.070839. Epub 2016 Jul 22.

Large-Scale Multiple Testing of Correlations.

J Am Stat Assoc. 2016;111(513):229-240. doi: 10.1080/01621459.2014.999157. Epub 2016 May 5.

Pharmacometabolomics study identifies circulating spermidine and tryptophan as potential biomarkers associated with the complete pathological response to trastuzumab-paclitaxel neoadjuvant therapy in HER-2 positive breast cancer.

Oncotarget. 2016 Jun 28;7(26):39809-39822. doi: 10.18632/oncotarget.9489.

The huge Package for High-dimensional Undirected Graph Estimation in R.

J Mach Learn Res. 2012 Apr;13:1059-1062.

The influence of steroid receptor status on the cardiotoxicity risk in HER2-positive breast cancer patients receiving trastuzumab.

Arch Med Sci. 2015 Apr 25;11(2):371-7. doi: 10.5114/aoms.2015.50969. Epub 2015 Apr 23.

Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis.

Front Genet. 2013 Dec 23;4:289. doi: 10.3389/fgene.2013.00289. eCollection 2013.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于检测稀疏共表达图中边缘的混合模型及其在比较乳腺癌亚型中的应用。

A mixture model to detect edges in sparse co-expression graphs with an application for comparing breast cancer subtypes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献