Nagaraj Shivashankar H, Gasser Robin B, Ranganathan Shoba
Department of Chemistry and Biomolecular Sciences & Biotechnology Research Institute, Macquarie University, Sydney, Australia.
Brief Bioinform. 2007 Jan;8(1):6-21. doi: 10.1093/bib/bbl015. Epub 2006 May 23.
Expressed sequence tag (EST) sequencing projects are underway for numerous organisms, generating millions of short, single-pass nucleotide sequence reads, accumulating in EST databases. Extensive computational strategies have been developed to organize and analyse both small- and large-scale EST data for gene discovery, transcript and single nucleotide polymorphism analysis as well as functional annotation of putative gene products. We provide an overview of the significance of ESTs in the genomic era, their properties and the applications of ESTs. Methods adopted for each step of EST analysis by various research groups have been compared. Challenges that lie ahead in organizing and analysing the ever increasing EST data have also been identified. The most appropriate software tools for EST pre-processing, clustering and assembly, database matching and functional annotation have been compiled (available online from http://biolinfo.org/EST). We propose a road map for EST analysis to accelerate the effective analyses of EST data sets. An investigation of EST analysis platforms reveals that they all terminate prior to downstream functional annotation including gene ontologies, motif/pattern analysis and pathway mapping.
许多生物体的表达序列标签(EST)测序项目正在进行中,产生了数百万条短的、单通道核苷酸序列读数,并积累在EST数据库中。已经开发出广泛的计算策略来组织和分析小规模和大规模的EST数据,用于基因发现、转录本和单核苷酸多态性分析以及推定基因产物的功能注释。我们概述了EST在基因组时代的重要性、它们的特性以及EST的应用。比较了各个研究小组在EST分析的每个步骤中采用的方法。还确定了在组织和分析不断增加的EST数据方面面临的挑战。已编制了用于EST预处理、聚类和组装、数据库匹配和功能注释的最合适软件工具(可从http://biolinfo.org/EST在线获取)。我们提出了一个EST分析路线图,以加速对EST数据集的有效分析。对EST分析平台的调查表明,它们都在包括基因本体、基序/模式分析和通路映射在内的下游功能注释之前终止。