Centre for Genome Research, School of Biological Sciences, University of Liverpool, Crown Street, Liverpool, L69 7ZB, UK.
BMC Genomics. 2009 Nov 26;10:560. doi: 10.1186/1471-2164-10-560.
Sequence identification of ESTs from non-model species offers distinct challenges particularly when these species have duplicated genomes and when they are phylogenetically distant from sequenced model organisms. For the common carp, an environmental model of aquacultural interest, large numbers of ESTs remained unidentified using BLAST sequence alignment. We have used the expression profiles from large-scale microarray experiments to suggest gene identities.
Expression profiles from approximation 700 cDNA microarrays describing responses of 7 major tissues to multiple environmental stressors were used to define a co-expression landscape. This was based on the Pearsons correlation coefficient relating each gene with all other genes, from which a network description provided clusters of highly correlated genes as 'mountains'. We show that these contain genes with known identities and genes with unknown identities, and that the correlation constitutes evidence of identity in the latter. This procedure has suggested identities to 522 of 2701 unknown carp ESTs sequences. We also discriminate several common carp genes and gene isoforms that were not discriminated by BLAST sequence alignment alone. Precision in identification was substantially improved by use of data from multiple tissues and treatments.
The detailed analysis of co-expression landscapes is a sensitive technique for suggesting an identity for the large number of BLAST unidentified cDNAs generated in EST projects. It is capable of detecting even subtle changes in expression profiles, and thereby of distinguishing genes with a common BLAST identity into different identities. It benefits from the use of multiple treatments or contrasts, and from the large-scale microarray data.
从非模式物种中鉴定 EST 序列尤其具有挑战性,特别是当这些物种具有重复的基因组且与已测序的模式生物在系统发育上相距甚远时。对于具有水产养殖价值的环境模式生物鲤鱼来说,使用 BLAST 序列比对,大量的 EST 仍然无法识别。我们利用大规模微阵列实验的表达谱来推测基因的身份。
来自大约 700 个 cDNA 微阵列的表达谱,描述了 7 种主要组织对多种环境胁迫的反应,用于定义一个共表达景观。这是基于皮尔逊相关系数,它将每个基因与所有其他基因相关联,由此提供了高度相关基因的网络描述,这些基因被称为“山脉”。我们表明,这些山脉中包含具有已知身份的基因和具有未知身份的基因,而相关性则构成了后者身份的证据。这一过程已经为 2701 个未知鲤鱼 EST 序列中的 522 个提供了身份建议。我们还区分了几个鲤鱼基因和基因同工型,这些基因同工型不能仅通过 BLAST 序列比对来区分。通过使用来自多个组织和处理的数据,识别的准确性得到了大大提高。
共表达景观的详细分析是一种敏感的技术,可以为在 EST 项目中生成的大量无法通过 BLAST 识别的 cDNA 提供身份建议。它能够检测到表达谱中的细微变化,从而将具有共同 BLAST 身份的基因区分成不同的身份。它受益于多种处理或对比的使用,以及大规模微阵列数据。