Institute of Bioinformatics and Applied Biotechnology, Biotech Park, Electronic City, Bangalore - 560100, Karnataka state, India.
BMC Genomics. 2010 Aug 11;11:467. doi: 10.1186/1471-2164-11-467.
In the recent years, there has been a rise in gene expression profiling reports. Unfortunately, it has not been possible to make maximum use of available gene expression data. Many databases and programs can be used to derive the possible expression patterns of mammalian genes, based on existing data. However, these available resources have limitations. For example, it is not possible to obtain a list of genes that are expressed in certain conditions. To overcome such limitations, we have taken up a new strategy to predict gene expression patterns using available information, for one tissue at a time.
The first step of this approach involved manual collection of maximum data derived from large-scale (genome-wide) gene expression studies, pertaining to mammalian testis. These data have been compiled into a Mammalian Gene Expression Testis-database (MGEx-Tdb). This process resulted in a richer collection of gene expression data compared to other databases/resources, for multiple testicular conditions. The gene-lists collected this way in turn were exploited to derive a 'consensus' expression status for each gene, across studies. The expression information obtained from the newly developed database mostly agreed with results from multiple small-scale studies on selected genes. A comparative analysis showed that MGEx-Tdb can retrieve the gene expression information more efficiently than other commonly used databases. It has the ability to provide a clear expression status (transcribed or dormant) for most genes, in the testis tissue, under several specific physiological/experimental conditions and/or cell-types.
Manual compilation of gene expression data, which can be a painstaking process, followed by a consensus expression status determination for specific locations and conditions, can be a reliable way of making use of the existing data to predict gene expression patterns. MGEx-Tdb provides expression information for 14 different combinations of specific locations and conditions in humans (25,158 genes), 79 in mice (22,919 genes) and 23 in rats (14,108 genes). It is also the first system that can predict expression of genes with a 'reliability-score', which is calculated based on the extent of agreements and contradictions across gene-sets/studies. This new platform is publicly available at the following web address: http://resource.ibab.ac.in/MGEx-Tdb/.
近年来,基因表达谱报告的数量有所增加。不幸的是,我们尚未能够充分利用现有的基因表达数据。许多数据库和程序可以根据现有数据来推断哺乳动物基因的可能表达模式。然而,这些现有资源存在局限性。例如,无法获得在特定条件下表达的基因列表。为了克服这些限制,我们采取了一种新策略,一次针对一个组织,利用现有信息来预测基因表达模式。
该方法的第一步涉及手动收集最大数量的数据,这些数据源自大规模(全基因组)基因表达研究,涉及哺乳动物睾丸。这些数据已被汇编到哺乳动物睾丸基因表达数据库(MGEx-Tdb)中。与其他数据库/资源相比,该过程为多种睾丸条件产生了更丰富的基因表达数据集。以这种方式收集的基因列表反过来被用于推断每个基因在研究中的“共识”表达状态。从新开发的数据库中获得的表达信息与多个选定基因的小规模研究结果大多一致。比较分析表明,MGEx-Tdb 比其他常用数据库更有效地检索基因表达信息。它具有在几种特定生理/实验条件和/或细胞类型下,为睾丸组织中的大多数基因提供明确表达状态(转录或休眠)的能力。
手动编译基因表达数据可能是一个艰苦的过程,然后确定特定位置和条件的共识表达状态,可以成为利用现有数据预测基因表达模式的可靠方法。MGEx-Tdb 为人类的 14 种不同特定位置和条件组合(25158 个基因)、小鼠的 79 种组合(22919 个基因)和大鼠的 23 种组合(14108 个基因)提供了表达信息。它也是第一个可以预测具有“可信度评分”的基因表达的系统,该评分是根据基因集/研究之间的一致性和矛盾程度计算得出的。该新平台可在以下网址公开获取:http://resource.ibab.ac.in/MGEx-Tdb/。