Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence, Macquarie University, Sydney, NSW 2109, Australia.
BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S24. doi: 10.1186/1471-2105-13-S17-S24. Epub 2012 Dec 13.
The transcriptome of an organism can be studied with the analysis of expressed sequence tag (EST) data sets that offers a rapid and cost effective approach with several new and updated bioinformatics approaches and tools for assembly and annotation. The comprehensive analyses comprehend an organism along with the genome and proteome analysis. With the advent of large-scale sequencing projects and generation of sequence data at protein and cDNA levels, automated analysis pipeline is necessary to store, organize and annotate ESTs.
TranSeqAnnotator is a workflow for large-scale analysis of transcriptomic data with the most appropriate bioinformatics tools for data management and analysis. The pipeline automatically cleans, clusters, assembles and generates consensus sequences, conceptually translates these into possible protein products and assigns putative function based on various DNA and protein similarity searches. Excretory/secretory (ES) proteins inferred from ESTs/short reads are also identified. The TranSeqAnnotator accepts FASTA format raw and quality ESTs along with protein and short read sequences and are analysed with user selected programs. After pre-processing and assembly, the dataset is annotated at the nucleotide, protein and ES protein levels.
TranSeqAnnotator has been developed in a Linux cluster, to perform an exhaustive and reliable analysis and provide detailed annotation. TranSeqAnnotator outputs gene ontologies, protein functional identifications in terms of mapping to protein domains and metabolic pathways. The pipeline is applied to annotate large EST datasets to identify several novel and known genes with therapeutic experimental validations and could serve as potential targets for parasite intervention. TransSeqAnnotator is freely available for the scientific community at http://estexplorer.biolinfo.org/TranSeqAnnotator/.
通过分析表达序列标签(EST)数据集可以研究生物体的转录组,这是一种快速且具有成本效益的方法,同时有多个新的和更新的生物信息学方法和工具可用于组装和注释。全面的分析包括基因组和蛋白质组分析。随着大规模测序项目的出现以及蛋白质和 cDNA 水平序列数据的产生,需要自动化分析管道来存储、组织和注释 EST。
TranSeqAnnotator 是一个用于大规模转录组数据分析的工作流程,它使用了最适合数据管理和分析的生物信息学工具。该流水线自动清洗、聚类、组装和生成共识序列,将这些序列概念性地翻译成可能的蛋白质产物,并根据各种 DNA 和蛋白质相似性搜索分配可能的功能。还从 EST/短读中推断出分泌/外分泌(ES)蛋白。TranSeqAnnotator 接受 FASTA 格式的原始和质量 EST 以及蛋白质和短读序列,并使用用户选择的程序进行分析。在预处理和组装后,数据集在核苷酸、蛋白质和 ES 蛋白质水平上进行注释。
TranSeqAnnotator 是在 Linux 集群中开发的,用于进行详尽可靠的分析并提供详细注释。TranSeqAnnotator 输出基因本体论、蛋白质功能识别,包括映射到蛋白质结构域和代谢途径。该流水线应用于注释大型 EST 数据集,以识别具有治疗实验验证的几个新的和已知基因,并可作为寄生虫干预的潜在靶标。TransSeqAnnotator 可在 http://estexplorer.biolinfo.org/TranSeqAnnotator/ 上免费供科学界使用。