通过表达数据集降采样和网络聚合来提高基因共表达网络质量。

Improved gene co-expression network quality through expression dataset down-sampling and network aggregation.

机构信息

EA2106 BBV, Université de Tours, Tours, 37200, France.

EA3142 GEIHP, Université d'Angers, Université Bretagne-Loire, Angers, 49100, France.

出版信息

Sci Rep. 2019 Oct 8;9(1):14431. doi: 10.1038/s41598-019-50885-8.

DOI:10.1038/s41598-019-50885-8

PMID:31594989

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6783424/

Abstract

Large-scale gene co-expression networks are an effective methodology to analyze sets of co-expressed genes and discover new gene functions or associations. Distances between genes are estimated according to their expression profiles and are visualized in networks that may be further partitioned to reveal communities of co-expressed genes. Creating expression profiles is now eased by the large amounts of publicly available expression data (microarrays and RNA-seq). Although many distance calculation methods have been intensively compared and reviewed in the past, it is unclear how to proceed when many samples reflecting a wide range of different conditions are available. Should as many samples as possible be integrated into network construction or be partitioned into smaller sets of more related samples? Previous studies have indicated a saturation in network performances to capture known associations once a certain number of samples is included in distance calculations. Here, we examined the influence of sample size on co-expression network construction using microarray and RNA-seq expression data from three plant species. We tested different down-sampling methods and compared network performances in recovering known gene associations to networks obtained from full datasets. We further examined how aggregating networks may help increase this performance by testing six aggregation methods.

摘要

大规模基因共表达网络是分析一组共表达基因并发现新基因功能或关联的有效方法。根据基因的表达谱估计基因之间的距离，并将其可视化在网络中，这些网络可以进一步划分以揭示共表达基因的群落。由于大量公开可用的表达数据（微阵列和 RNA-seq），现在创建表达谱变得更加容易。尽管过去已经对许多距离计算方法进行了深入比较和综述，但当有许多反映广泛不同条件的样本可用时，如何进行仍然不清楚。是否应该尽可能多地将样本整合到网络构建中，还是将其划分为更小的、更相关的样本集？先前的研究表明，一旦在距离计算中包含了一定数量的样本，网络性能就会达到捕获已知关联的饱和点。在这里，我们使用来自三个植物物种的微阵列和 RNA-seq 表达数据研究了样本量对共表达网络构建的影响。我们测试了不同的下采样方法，并比较了恢复已知基因关联的网络性能与从完整数据集获得的网络性能。我们进一步研究了通过测试六种聚合方法，聚合网络如何帮助提高这种性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4af/6783424/f1a084d3c065/41598_2019_50885_Fig1_HTML.jpg

相似文献

Improved gene co-expression network quality through expression dataset down-sampling and network aggregation.通过表达数据集降采样和网络聚合来提高基因共表达网络质量。

Sci Rep. 2019 Oct 8;9(1):14431. doi: 10.1038/s41598-019-50885-8.

Construction and Optimization of a Large Gene Coexpression Network in Maize Using RNA-Seq Data.利用RNA-Seq数据构建和优化玉米大型基因共表达网络

Plant Physiol. 2017 Sep;175(1):568-583. doi: 10.1104/pp.17.00825. Epub 2017 Aug 2.

Network aggregation improves gene function prediction of grapevine gene co-expression networks.网络聚合提高了葡萄基因共表达网络的基因功能预测。

Plant Mol Biol. 2020 Jul;103(4-5):425-441. doi: 10.1007/s11103-020-01001-2. Epub 2020 Apr 7.

Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study.通过对输入表达样本进行预聚类来最大化基因共表达关系的捕获：拟南芥案例研究

BMC Syst Biol. 2013 Jun 5;7:44. doi: 10.1186/1752-0509-7-44.

LSTrAP: efficiently combining RNA sequencing data into co-expression networks.LSTrAP：将RNA测序数据高效整合到共表达网络中。

BMC Bioinformatics. 2017 Oct 10;18(1):444. doi: 10.1186/s12859-017-1861-z.

Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size.针对样本量有限的大型共表达网络的调控推理评估与改进

BMC Syst Biol. 2017 Jun 19;11(1):62. doi: 10.1186/s12918-017-0440-2.

Comparative study of RNA-seq- and microarray-derived coexpression networks in Arabidopsis thaliana.拟南芥 RNA-seq 和微阵列衍生共表达网络的比较研究。

Bioinformatics. 2013 Mar 15;29(6):717-24. doi: 10.1093/bioinformatics/btt053. Epub 2013 Feb 1.

EXPath 2.0: An Updated Database for Integrating High-Throughput Gene Expression Data with Biological Pathways.EXPath 2.0：一个用于整合高通量基因表达数据与生物途径的更新数据库。

Plant Cell Physiol. 2020 Oct 1;61(10):1818-1827. doi: 10.1093/pcp/pcaa115.

ATTED-II in 2016: A Plant Coexpression Database Towards Lineage-Specific Coexpression.2016年的ATTED-II：一个针对谱系特异性共表达的植物共表达数据库。

Plant Cell Physiol. 2016 Jan;57(1):e5. doi: 10.1093/pcp/pcv165. Epub 2015 Nov 6.

Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks.对全基因组相关性测量进行排名可改进基于微阵列和 RNA-seq 的全局和靶向共表达网络。

Sci Rep. 2018 Jul 18;8(1):10885. doi: 10.1038/s41598-018-29077-3.

引用本文的文献

Cell type heterogeneity in gene co-expression networks: implications for toxicological research.基因共表达网络中的细胞类型异质性：对毒理学研究的启示。

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf421.

DRaCOon: a novel algorithm for pathway-level differential co-expression analysis in transcriptomics.DRaCOon：一种用于转录组学中通路水平差异共表达分析的新算法。

BMC Bioinformatics. 2025 May 26;26(1):137. doi: 10.1186/s12859-025-06162-9.

A method for mining condition-specific co-expressed genes in Camellia sinensis based on k-means clustering.基于 K 均值聚类的茶树条件特异性共表达基因挖掘方法。

BMC Plant Biol. 2024 May 8;24(1):373. doi: 10.1186/s12870-024-05086-5.

Unraveling patient heterogeneity in complex diseases through individualized co-expression networks: a perspective.通过个性化共表达网络解析复杂疾病中的患者异质性：一种观点

Front Genet. 2023 Aug 10;14:1209416. doi: 10.3389/fgene.2023.1209416. eCollection 2023.

Co-Expression Networks in Sunflower: Harnessing the Power of Multi-Study Transcriptomic Public Data to Identify and Categorize Candidate Genes for Fungal Resistance.向日葵中的共表达网络：利用多研究转录组公共数据的力量来鉴定和分类抗真菌候选基因。

Plants (Basel). 2023 Jul 25;12(15):2767. doi: 10.3390/plants12152767.

siVAE: interpretable deep generative models for single-cell transcriptomes.siVAE：单细胞转录组的可解释深度生成模型。

Genome Biol. 2023 Feb 20;24(1):29. doi: 10.1186/s13059-023-02850-y.

GeneFriends: gene co-expression databases and tools for humans and model organisms.GeneFriends：人类和模式生物的基因共表达数据库和工具。

Nucleic Acids Res. 2023 Jan 6;51(D1):D145-D158. doi: 10.1093/nar/gkac1031.

COXPRESdb v8: an animal gene coexpression database navigating from a global view to detailed investigations.COXPRESdb v8：一个从全局视角到详细研究的动物基因共表达数据库。

Nucleic Acids Res. 2023 Jan 6;51(D1):D80-D87. doi: 10.1093/nar/gkac983.

Comparative transcriptomic analysis revealed dynamic changes of distinct classes of genes during development of the Manila clam (Ruditapes philippinarum).比较转录组分析揭示了菲律宾蛤仔（Ruditapes philippinarum）发育过程中不同类群基因的动态变化。

BMC Genomics. 2022 Sep 29;23(1):676. doi: 10.1186/s12864-022-08813-0.

GCEN: An Easy-to-Use Toolkit for Gene Co-Expression Network Analysis and lncRNAs Annotation.GCEN：一个用于基因共表达网络分析和长链非编码RNA注释的易于使用的工具包。

Curr Issues Mol Biol. 2022 Mar 25;44(4):1479-1487. doi: 10.3390/cimb44040100.

本文引用的文献

Sci Rep. 2018 Jul 18;8(1):10885. doi: 10.1038/s41598-018-29077-3.

Missing enzymes in the biosynthesis of the anticancer drug vinblastine in Madagascar periwinkle.在马达加斯加长春花中，抗癌药物长春碱生物合成过程中缺失酶。

Science. 2018 Jun 15;360(6394):1235-1239. doi: 10.1126/science.aat4100. Epub 2018 May 3.

ATTED-II in 2018: A Plant Coexpression Database Based on Investigation of the Statistical Property of the Mutual Rank Index.2018 年的 ATTED-II：基于互秩指数统计特性研究的植物共表达数据库。

Plant Cell Physiol. 2018 Jan 1;59(1):e3. doi: 10.1093/pcp/pcx191.

The Oxylipin Pathways: Biochemistry and Function.脂氧素途径：生物化学与功能。

Annu Rev Plant Biol. 2018 Apr 29;69:363-386. doi: 10.1146/annurev-arplant-042817-040440. Epub 2017 Nov 20.

Effects of threshold on the topology of gene co-expression networks.阈值对基因共表达网络拓扑结构的影响。

Mol Biosyst. 2017 Sep 26;13(10):2024-2035. doi: 10.1039/c7mb00101k.

Expression atlas and comparative coexpression network analyses reveal important genes involved in the formation of lignified cell wall in Brachypodium distachyon.表达图谱和比较共表达网络分析揭示了参与拟南芥木质化细胞壁形成的重要基因。

New Phytol. 2017 Aug;215(3):1009-1025. doi: 10.1111/nph.14635. Epub 2017 Jun 15.

agriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update.agriGO v2.0：农业社区的 GO 分析工具包，2017 年更新。

Nucleic Acids Res. 2017 Jul 3;45(W1):W122-W129. doi: 10.1093/nar/gkx382.

Salmon provides fast and bias-aware quantification of transcript expression.鲑鱼提供快速且无偏倚的转录本表达定量。

Nat Methods. 2017 Apr;14(4):417-419. doi: 10.1038/nmeth.4197. Epub 2017 Mar 6.

EGAD: ultra-fast functional analysis of gene networks.EGAD：基因网络的超快速功能分析

Bioinformatics. 2017 Feb 15;33(4):612-614. doi: 10.1093/bioinformatics/btw695.

PlaNet: Comparative Co-Expression Network Analyses for Plants.PlaNet：植物的比较共表达网络分析

Methods Mol Biol. 2017;1533:213-227. doi: 10.1007/978-1-4939-6658-5_12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过表达数据集降采样和网络聚合来提高基因共表达网络质量。

Improved gene co-expression network quality through expression dataset down-sampling and network aggregation.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献