Suppr超能文献

微阵列基因共表达矩阵阈值选择方法的比较

Comparison of threshold selection methods for microarray gene co-expression matrices.

作者信息

Borate Bhavesh R, Chesler Elissa J, Langston Michael A, Saxton Arnold M, Voy Brynn H

机构信息

Department of Animal Science, University of Tennessee, Knoxville, Tennessee, USA.

出版信息

BMC Res Notes. 2009 Dec 2;2:240. doi: 10.1186/1756-0500-2-240.

Abstract

BACKGROUND

Network and clustering analyses of microarray co-expression correlation data often require application of a threshold to discard small correlations, thus reducing computational demands and decreasing the number of uninformative correlations. This study investigated threshold selection in the context of combinatorial network analysis of transcriptome data.

FINDINGS

Six conceptually diverse methods - based on number of maximal cliques, correlation of control spots with expressed genes, top 1% of correlations, spectral graph clustering, Bonferroni correction of p-values, and statistical power - were used to estimate a correlation threshold for three time-series microarray datasets. The validity of thresholds was tested by comparison to thresholds derived from Gene Ontology information. Stability and reliability of the best methods were evaluated with block bootstrapping.Two threshold methods, number of maximal cliques and spectral graph, used information in the correlation matrix structure and performed well in terms of stability. Comparison to Gene Ontology found thresholds from number of maximal cliques extracted from a co-expression matrix were the most biologically valid. Approaches to improve both methods were suggested.

CONCLUSION

Threshold selection approaches based on network structure of gene relationships gave thresholds with greater relevance to curated biological relationships than approaches based on statistical pair-wise relationships.

摘要

背景

对微阵列共表达相关性数据进行网络和聚类分析时,通常需要应用一个阈值来舍弃较小的相关性,从而降低计算需求并减少无信息价值的相关性数量。本研究在转录组数据的组合网络分析背景下探讨了阈值选择问题。

研究结果

基于最大团数量、对照点与表达基因的相关性、前1%的相关性、谱图聚类、p值的Bonferroni校正以及统计功效,使用了六种概念上不同的方法来估计三个时间序列微阵列数据集的相关性阈值。通过与从基因本体信息得出的阈值进行比较来测试阈值的有效性。使用块重抽样评估最佳方法的稳定性和可靠性。两种阈值方法,即最大团数量和谱图方法,利用了相关矩阵结构中的信息,在稳定性方面表现良好。与基因本体的比较发现,从共表达矩阵中提取的最大团数量得出的阈值在生物学上最有效。提出了改进这两种方法的途径。

结论

基于基因关系网络结构的阈值选择方法比基于统计成对关系的方法得出的阈值与经过整理的生物学关系更相关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f9d7/2794870/08aa1bd07630/1756-0500-2-240-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验