Department of Agriculture, University of Naples Federico II, Portici, Italy.
Dipartimento di Agraria , University of Naples Federico II, via Università, Portici (NA), Italy.
Brief Bioinform. 2017 Mar 1;18(2):215-225. doi: 10.1093/bib/bbw002.
Bioinformatics web-based resources and databases are precious references for most biological laboratories worldwide. However, the quality and reliability of the information they provide depends on them being used in an appropriate way that takes into account their specific features. Huge collections of gene expression data are currently publicly available, ready to support the understanding of gene and genome functionalities. In this context, tools and resources for gene co-expression analyses have flourished to exploit the 'guilty by association' principle, which assumes that genes with correlated expression profiles are functionally related. In the case of Arabidopsis thaliana, the reference species in plant biology, the resources available mainly consist of microarray results. After a general overview of such resources, we tested and compared the results they offer for gene co-expression analysis. We also discuss the effect on the results when using different data sets, as well as different data normalization approaches and parameter settings, which often consider different metrics for establishing co-expression. A dedicated example analysis of different gene pools, implemented by including/excluding mutant samples in a reference data set, showed significant variation of gene co-expression occurrence, magnitude and direction. We conclude that, as the heterogeneity of the resources and methods may produce different results for the same query genes, the exploration of more than one of the available resources is strongly recommended. The aim of this article is to show how best to integrate data sources and/or merge outputs to achieve robust analyses and reliable interpretations, thereby making use of diverse data resources an opportunity for added value.
生物信息学的网络资源和数据库是全球大多数生物实验室的宝贵参考资料。然而,它们提供的信息的质量和可靠性取决于它们是否以适当的方式使用,考虑到它们的特定特征。目前,大量的基因表达数据可供公开使用,以支持对基因和基因组功能的理解。在这种情况下,用于基因共表达分析的工具和资源蓬勃发展,以利用“关联即有罪”的原则,该原则假定表达谱相关的基因在功能上是相关的。在拟南芥(Arabidopsis thaliana)这种植物生物学的参考物种中,可用的资源主要包括微阵列结果。在对这些资源进行了一般性概述之后,我们测试并比较了它们为基因共表达分析提供的结果。我们还讨论了使用不同数据集以及不同数据归一化方法和参数设置对结果的影响,这些方法和设置通常考虑用于建立共表达的不同指标。通过在参考数据集包括/排除突变样本来对不同基因池进行专门的示例分析,显示出基因共表达发生、幅度和方向的显著变化。我们得出结论,由于资源和方法的异质性可能会对同一查询基因产生不同的结果,因此强烈建议探索多个可用资源。本文的目的是展示如何最好地整合数据源和/或合并输出,以实现稳健的分析和可靠的解释,从而利用多样化的数据资源实现增值。