通过对输入表达样本进行预聚类来最大化基因共表达关系的捕获：拟南芥案例研究

Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study.

作者信息

Feltus F Alex, Ficklin Stephen P, Gibson Scott M, Smith Melissa C

机构信息

Department of Genetics & Biochemistry, Clemson University, 105 Collings Street, Clemson, SC 29634, USA.

出版信息

BMC Syst Biol. 2013 Jun 5;7:44. doi: 10.1186/1752-0509-7-44.

DOI:10.1186/1752-0509-7-44

PMID:23738693

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3679940/

Abstract

BACKGROUND

In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium.

RESULTS

A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network.

CONCLUSIONS

Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired.

摘要

背景

在基因组学中，通过在表达数据集中寻找基因之间显著的成对相关性，构建了高度相关的基因相互作用（共表达）网络。然后挖掘这些网络以阐明多基因水平的生物学功能。在某些情况下，网络可能由在各种不同条件下测量基因表达的输入样本构建，例如针对不同的基因型、环境、疾病状态和组织。当从公共存储库获得大量样本时，将样本关联到特定条件组通常难以管理，并且合并来自各种条件的样本会对网络规模产生负面影响。通常还应用固定的显著性阈值，这也限制了最终网络的规模。因此，我们建议对输入的表达样本进行预聚类，以近似样本的特定条件分组，并对每个组进行单独的网络构建，作为动态显著性阈值化的一种手段。其净效应是提高灵敏度，从而在最终的共表达网络汇编中最大化共表达关系的总数。

结果

对7105个公开可用的拟南芥ATH1 Affymetrix微阵列样本进行k均值划分后，共构建了86个拟南芥共表达网络。我们将每个预排序的网络称为基因相互作用层（GIL）。随机矩阵理论（RMT）是一种无监督阈值化方法，用于独立地对86个网络中的每一个进行阈值化，有效地为网络提供了一个动态（非全局）阈值。所有GIL中的基因总数达到19588个（测量基因覆盖率为94.7%）和558022个独特的共表达关系。相比之下，不对输入样本进行预排序的网络构建在全局网络中仅产生3297个基因（15.9%）和129134个关系。

结论

在这里我们表明，微阵列样本的预聚类有助于近似特定条件的网络，并允许使用无监督方法进行动态阈值化。由于RMT确保只保留高度显著的相互作用，GIL汇编包含了ATH1阵列上几乎所有可测量基因的558022个独特的高质量拟南芥共表达关系。对于拟南芥来说，这些网络代表了迄今为止最大的显著基因共表达关系汇编，是探索这种重点模式植物的复杂途径、多基因和多效性关系的一种手段。这些网络可在sysbio.genome.clemson.edu上进行探索。最后，该方法适用于任何生物体的任何大型表达谱集合，并且最适合于需要独立于知识的网络构建方法的情况。

相似文献

Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study.通过对输入表达样本进行预聚类来最大化基因共表达关系的捕获：拟南芥案例研究

BMC Syst Biol. 2013 Jun 5;7:44. doi: 10.1186/1752-0509-7-44.

Systematic identification of functional modules and cis-regulatory elements in Arabidopsis thaliana.拟南芥功能模块和顺式调控元件的系统鉴定。

BMC Bioinformatics. 2011 Nov 24;12 Suppl 12(Suppl 12):S2. doi: 10.1186/1471-2105-12-S12-S2.

Integrated functional networks of process, tissue, and developmental stage specific interactions in Arabidopsis thaliana.拟南芥中过程、组织和发育阶段特异性相互作用的综合功能网络。

BMC Syst Biol. 2010 Dec 31;4:180. doi: 10.1186/1752-0509-4-180.

Identification of metagenes and their interactions through large-scale analysis of Arabidopsis gene expression data.通过大规模分析拟南芥基因表达数据鉴定基因元及其相互作用。

BMC Genomics. 2012 Jun 13;13:237. doi: 10.1186/1471-2164-13-237.

Annotation of gene function in citrus using gene expression information and co-expression networks.利用基因表达信息和共表达网络注释柑橘中的基因功能。

BMC Plant Biol. 2014 Jul 15;14:186. doi: 10.1186/1471-2229-14-186.

Comparative study of RNA-seq- and microarray-derived coexpression networks in Arabidopsis thaliana.拟南芥 RNA-seq 和微阵列衍生共表达网络的比较研究。

Bioinformatics. 2013 Mar 15;29(6):717-24. doi: 10.1093/bioinformatics/btt053. Epub 2013 Feb 1.

Massive-scale gene co-expression network construction and robustness testing using random matrix theory.大规模基因共表达网络构建及基于随机矩阵理论的稳健性测试。

PLoS One. 2013;8(2):e55871. doi: 10.1371/journal.pone.0055871. Epub 2013 Feb 7.

Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.用于从大型公共数据库推断基因组规模网络的微阵列数据处理技术

Microarrays (Basel). 2016 Sep 19;5(3):23. doi: 10.3390/microarrays5030023.

Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory.利用随机矩阵理论构建基因共表达网络并预测未知基因的功能。

BMC Bioinformatics. 2007 Aug 14;8:299. doi: 10.1186/1471-2105-8-299.

Conserved non-coding regulatory signatures in Arabidopsis co-expressed gene modules.拟南芥共表达基因模块中保守的非编码调控特征。

PLoS One. 2012;7(9):e45041. doi: 10.1371/journal.pone.0045041. Epub 2012 Sep 14.

引用本文的文献

A method for mining condition-specific co-expressed genes in Camellia sinensis based on k-means clustering.基于 K 均值聚类的茶树条件特异性共表达基因挖掘方法。

BMC Plant Biol. 2024 May 8;24(1):373. doi: 10.1186/s12870-024-05086-5.

Addressing noise in co-expression network construction.解决共表达网络构建中的噪声问题。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab495.

Improved gene co-expression network quality through expression dataset down-sampling and network aggregation.通过表达数据集降采样和网络聚合来提高基因共表达网络质量。

Sci Rep. 2019 Oct 8;9(1):14431. doi: 10.1038/s41598-019-50885-8.

Gene co-expression network analysis identifies trait-related modules in Arabidopsis thaliana.基因共表达网络分析鉴定拟南芥中与性状相关的模块。

Planta. 2019 May;249(5):1487-1501. doi: 10.1007/s00425-019-03102-9. Epub 2019 Jan 30.

Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks.对全基因组相关性测量进行排名可改进基于微阵列和 RNA-seq 的全局和靶向共表达网络。

Sci Rep. 2018 Jul 18;8(1):10885. doi: 10.1038/s41598-018-29077-3.

Sorting Five Human Tumor Types Reveals Specific Biomarkers and Background Classification Genes.五种人类肿瘤类型的分类揭示了特定的生物标志物和背景分类基因。

Sci Rep. 2018 May 25;8(1):8180. doi: 10.1038/s41598-018-26310-x.

Discovering Condition-Specific Gene Co-Expression Patterns Using Gaussian Mixture Models: A Cancer Case Study.利用高斯混合模型发现条件特异性基因共表达模式：癌症案例研究。

Sci Rep. 2017 Aug 17;7(1):8617. doi: 10.1038/s41598-017-09094-4.

Topological features of a gene co-expression network predict patterns of natural diversity in environmental response.基因共表达网络的拓扑特征预测环境响应中的自然多样性模式。

Proc Biol Sci. 2017 Jun 14;284(1856). doi: 10.1098/rspb.2017.0914.

OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid.OSG-GEM：利用开放科学网格构建基因表达矩阵

Bioinform Biol Insights. 2016 Aug 2;10:133-41. doi: 10.4137/BBI.S38193. eCollection 2016.

Learning from Co-expression Networks: Possibilities and Challenges.从共表达网络中学习：可能性与挑战。

Front Plant Sci. 2016 Apr 8;7:444. doi: 10.3389/fpls.2016.00444. eCollection 2016.

本文引用的文献

Massive-scale gene co-expression network construction and robustness testing using random matrix theory.大规模基因共表达网络构建及基于随机矩阵理论的稳健性测试。

PLoS One. 2013;8(2):e55871. doi: 10.1371/journal.pone.0055871. Epub 2013 Feb 7.

Discriminative local subspaces in gene expression data for effective gene function prediction.基于基因表达数据的判别局部子空间用于有效的基因功能预测。

Bioinformatics. 2012 Sep 1;28(17):2256-64. doi: 10.1093/bioinformatics/bts455. Epub 2012 Jul 20.

Exploring tomato gene functions based on coexpression modules using graph clustering and differential coexpression approaches.基于图聚类和差异共表达方法，利用共表达模块探索番茄基因功能。

Plant Physiol. 2012 Apr;158(4):1487-502. doi: 10.1104/pp.111.188367. Epub 2012 Feb 3.

The transcriptome of the reference potato genome Solanum tuberosum Group Phureja clone DM1-3 516R44.参考马铃薯基因组 Solanum tuberosum Group Phureja 克隆 DM1-3 516R44 的转录组。

PLoS One. 2011;6(10):e26801. doi: 10.1371/journal.pone.0026801. Epub 2011 Oct 28.

Functional network construction in Arabidopsis using rule-based machine learning on large-scale data sets.基于规则的机器学习在大规模数据集上构建拟南芥的功能网络。

Plant Cell. 2011 Sep;23(9):3101-16. doi: 10.1105/tpc.111.088153. Epub 2011 Sep 6.

Combining genome-wide association mapping and transcriptional networks to identify novel genes controlling glucosinolates in Arabidopsis thaliana.结合全基因组关联图谱和转录网络鉴定控制拟南芥硫苷的新基因。

PLoS Biol. 2011 Aug;9(8):e1001125. doi: 10.1371/journal.pbio.1001125. Epub 2011 Aug 16.

Evidence for network evolution in an Arabidopsis interactome map.Arabidopsis 相互作用组图谱中网络进化的证据。

Science. 2011 Jul 29;333(6042):601-7. doi: 10.1126/science.1203877.

Gene coexpression network alignment and conservation of gene modules between two grass species: maize and rice.在两个禾本科物种：玉米和水稻之间进行基因共表达网络比对和基因模块的保守性分析。

Plant Physiol. 2011 Jul;156(3):1244-56. doi: 10.1104/pp.111.173047. Epub 2011 May 23.

linkcomm: an R package for the generation, visualization, and analysis of link communities in networks of arbitrary size and type.linkcomm：一个 R 包，用于生成、可视化和分析任意大小和类型网络中的链接社区。

Bioinformatics. 2011 Jul 15;27(14):2011-2. doi: 10.1093/bioinformatics/btr311. Epub 2011 May 19.

The association of multiple interacting genes with specific phenotypes in rice using gene coexpression networks.利用基因共表达网络研究水稻中多个相互作用的基因与特定表型的关系。

Plant Physiol. 2010 Sep;154(1):13-24. doi: 10.1104/pp.110.159459. Epub 2010 Jul 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过对输入表达样本进行预聚类来最大化基因共表达关系的捕获：拟南芥案例研究

Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献