TranSeqAnnotator：大规模转录组数据分析。

TranSeqAnnotator: large-scale analysis of transcriptomic data.

机构信息

Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence, Macquarie University, Sydney, NSW 2109, Australia.

出版信息

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S24. doi: 10.1186/1471-2105-13-S17-S24. Epub 2012 Dec 13.

DOI:10.1186/1471-2105-13-S17-S24

PMID:23282024

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3521237/

Abstract

BACKGROUND

The transcriptome of an organism can be studied with the analysis of expressed sequence tag (EST) data sets that offers a rapid and cost effective approach with several new and updated bioinformatics approaches and tools for assembly and annotation. The comprehensive analyses comprehend an organism along with the genome and proteome analysis. With the advent of large-scale sequencing projects and generation of sequence data at protein and cDNA levels, automated analysis pipeline is necessary to store, organize and annotate ESTs.

RESULTS

TranSeqAnnotator is a workflow for large-scale analysis of transcriptomic data with the most appropriate bioinformatics tools for data management and analysis. The pipeline automatically cleans, clusters, assembles and generates consensus sequences, conceptually translates these into possible protein products and assigns putative function based on various DNA and protein similarity searches. Excretory/secretory (ES) proteins inferred from ESTs/short reads are also identified. The TranSeqAnnotator accepts FASTA format raw and quality ESTs along with protein and short read sequences and are analysed with user selected programs. After pre-processing and assembly, the dataset is annotated at the nucleotide, protein and ES protein levels.

CONCLUSION

TranSeqAnnotator has been developed in a Linux cluster, to perform an exhaustive and reliable analysis and provide detailed annotation. TranSeqAnnotator outputs gene ontologies, protein functional identifications in terms of mapping to protein domains and metabolic pathways. The pipeline is applied to annotate large EST datasets to identify several novel and known genes with therapeutic experimental validations and could serve as potential targets for parasite intervention. TransSeqAnnotator is freely available for the scientific community at http://estexplorer.biolinfo.org/TranSeqAnnotator/.

摘要

背景

通过分析表达序列标签（EST）数据集可以研究生物体的转录组，这是一种快速且具有成本效益的方法，同时有多个新的和更新的生物信息学方法和工具可用于组装和注释。全面的分析包括基因组和蛋白质组分析。随着大规模测序项目的出现以及蛋白质和 cDNA 水平序列数据的产生，需要自动化分析管道来存储、组织和注释 EST。

结果

TranSeqAnnotator 是一个用于大规模转录组数据分析的工作流程，它使用了最适合数据管理和分析的生物信息学工具。该流水线自动清洗、聚类、组装和生成共识序列，将这些序列概念性地翻译成可能的蛋白质产物，并根据各种 DNA 和蛋白质相似性搜索分配可能的功能。还从 EST/短读中推断出分泌/外分泌（ES）蛋白。TranSeqAnnotator 接受 FASTA 格式的原始和质量 EST 以及蛋白质和短读序列，并使用用户选择的程序进行分析。在预处理和组装后，数据集在核苷酸、蛋白质和 ES 蛋白质水平上进行注释。

结论

TranSeqAnnotator 是在 Linux 集群中开发的，用于进行详尽可靠的分析并提供详细注释。TranSeqAnnotator 输出基因本体论、蛋白质功能识别，包括映射到蛋白质结构域和代谢途径。该流水线应用于注释大型 EST 数据集，以识别具有治疗实验验证的几个新的和已知基因，并可作为寄生虫干预的潜在靶标。TransSeqAnnotator 可在 http://estexplorer.biolinfo.org/TranSeqAnnotator/ 上免费供科学界使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11e0/3521237/67a64e7eb8c3/1471-2105-13-S17-S24-1.jpg

相似文献

TranSeqAnnotator: large-scale analysis of transcriptomic data.

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S24. doi: 10.1186/1471-2105-13-S17-S24. Epub 2012 Dec 13.

ESTExplorer: an expressed sequence tag (EST) assembly and annotation platform.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W143-7. doi: 10.1093/nar/gkm378. Epub 2007 Jun 1.

In silico analysis of expressed sequence tags from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform with conventional database searches.

BMC Bioinformatics. 2008;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1471-2105-9-S1-S10.

An analysis of the transcriptome of Teladorsagia circumcincta: its biological and biotechnological implications.

BMC Genomics. 2012;13 Suppl 7(Suppl 7):S10. doi: 10.1186/1471-2164-13-S7-S10. Epub 2012 Dec 13.

A hitchhiker's guide to expressed sequence tag (EST) analysis.

Brief Bioinform. 2007 Jan;8(1):6-21. doi: 10.1093/bib/bbl015. Epub 2006 May 23.

Helminth secretome database (HSD): a collection of helminth excretory/secretory proteins predicted from expressed sequence tags (ESTs).

BMC Genomics. 2012;13 Suppl 7(Suppl 7):S8. doi: 10.1186/1471-2164-13-S7-S8. Epub 2012 Dec 13.

OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.

BMC Bioinformatics. 2009 Jul 16;10:219. doi: 10.1186/1471-2105-10-219.

JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow.

BMC Bioinformatics. 2006 Nov 23;7:513. doi: 10.1186/1471-2105-7-513.

Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds.

BMC Genomics. 2011 Feb 28;12:131. doi: 10.1186/1471-2164-12-131.

GarlicESTdb: an online database and mining tool for garlic EST sequences.

BMC Plant Biol. 2009 May 18;9:61. doi: 10.1186/1471-2229-9-61.

引用本文的文献

Macrogenomic Analysis Reveals Soil Microbial Diversity in Different Regions of the Antarctic Peninsula.

Microorganisms. 2024 Nov 27;12(12):2444. doi: 10.3390/microorganisms12122444.

Zinc Supplementation in a Randomized Controlled Trial Decreased ZIP4 and ZIP8 mRNA Abundance in Peripheral Blood Mononuclear Cells of Adult Women.

Nutr Metab Insights. 2015 May 12;8:7-14. doi: 10.4137/NMI.S23233. eCollection 2015.

InCoB2012 Conference: from biological data to knowledge to technological breakthroughs.

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S1. doi: 10.1186/1471-2105-13-S17-S1. Epub 2012 Dec 13.

本文引用的文献

An analysis of the transcriptome of Teladorsagia circumcincta: its biological and biotechnological implications.

BMC Genomics. 2012;13 Suppl 7(Suppl 7):S10. doi: 10.1186/1471-2164-13-S7-S10. Epub 2012 Dec 13.

Proteomic analysis of excretory-secretory products of Heligmosomoides polygyrus assessed with next-generation sequencing transcriptomic information.

PLoS Negl Trop Dis. 2011 Oct;5(10):e1370. doi: 10.1371/journal.pntd.0001370. Epub 2011 Oct 25.

A novel C-type lectin identified by EST analysis in tissue migratory larvae of Ascaris suum.

Parasitol Res. 2012 Apr;110(4):1583-6. doi: 10.1007/s00436-011-2677-9. Epub 2011 Oct 18.

KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases.

Nucleic Acids Res. 2011 Jul;39(Web Server issue):W316-22. doi: 10.1093/nar/gkr483.

Ascaris and ascariasis.

Microbes Infect. 2011 Jul;13(7):632-7. doi: 10.1016/j.micinf.2010.09.012. Epub 2010 Oct 8.

Distinct roles of four gelsolin-like domains of Caenorhabditis elegans gelsolin-like protein-1 in actin filament severing, barbed end capping, and phosphoinositide binding.

Biochemistry. 2010 May 25;49(20):4349-60. doi: 10.1021/bi100215b.

SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read.

BMC Bioinformatics. 2010 Jan 20;11:38. doi: 10.1186/1471-2105-11-38.

The IntAct molecular interaction database in 2010.

Nucleic Acids Res. 2010 Jan;38(Database issue):D525-31. doi: 10.1093/nar/gkp878. Epub 2009 Oct 22.

High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics.

BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):I1. doi: 10.1186/1471-2164-10-S1-I1.

Endogenous ligands for C-type lectin receptors: the true regulators of immune homeostasis.

Immunol Rev. 2009 Jul;230(1):22-37. doi: 10.1111/j.1600-065X.2009.00786.x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

TranSeqAnnotator：大规模转录组数据分析。

TranSeqAnnotator: large-scale analysis of transcriptomic data.

机构信息

Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence, Macquarie University, Sydney, NSW 2109, Australia.