Suppr超能文献

星系EST:通过自动系统发育分析确定EST身份

galaxieEST: addressing EST identity through automated phylogenetic analysis.

作者信息

Nilsson R Henrik, Rajashekar Balaji, Larsson Karl-Henrik, Ursing Björn M

机构信息

Götenborg University, Botanical Institute, SE-405 30, Sweden.

出版信息

BMC Bioinformatics. 2004 Jul 5;5:87. doi: 10.1186/1471-2105-5-87.

Abstract

BACKGROUND

Research involving expressed sequence tags (ESTs) is intricately coupled to the existence of large, well-annotated sequence repositories. Comparatively complete and satisfactory annotated public sequence libraries are, however, available only for a limited range of organisms, rendering the absence of sequences and gene structure information a tangible problem for those working with taxa lacking an EST or genome sequencing project. Paralogous genes belonging to the same gene family but distinguished by derived characteristics are particularly prone to misidentification and erroneous annotation; high but incomplete levels of sequence similarity are typically difficult to interpret and have formed the basis of many unsubstantiated assumptions of orthology. In these cases, a phylogenetic study of the query sequence together with the most similar sequences in the database may be of great value to the identification process. In order to facilitate this laborious procedure, a project to employ automated phylogenetic analysis in the identification of ESTs was initiated.

RESULTS

galaxieEST is an open source Perl-CGI script package designed to complement traditional similarity-based identification of EST sequences through employment of automated phylogenetic analysis. It uses a series of BLAST runs as a sieve to retrieve nucleotide and protein sequences for inclusion in neighbour joining and parsimony analyses; the output includes the BLAST output, the results of the phylogenetic analyses, and the corresponding multiple alignments. galaxieEST is available as an on-line web service for identification of fungal ESTs and for download / local installation for use with any organism group at http://galaxie.cgb.ki.se/galaxieEST.html.

CONCLUSIONS

By addressing sequence relatedness in addition to similarity, galaxieEST provides an integrative view on EST origin and identity, which may prove particularly useful in cases where similarity searches return one or more pertinent, but not full, matches and additional information on the query EST is needed.

摘要

背景

涉及表达序列标签(EST)的研究与大型、注释完善的序列数据库的存在紧密相关。然而,相对完整且令人满意的注释公共序列库仅适用于有限范围的生物,对于那些研究缺乏EST或基因组测序项目的分类群的人来说,序列和基因结构信息的缺失是一个切实存在的问题。属于同一基因家族但具有衍生特征差异的旁系同源基因特别容易被错误识别和错误注释;高但不完整的序列相似性水平通常难以解释,并且构成了许多未经证实的直系同源假设的基础。在这些情况下,对查询序列与数据库中最相似序列进行系统发育研究可能对识别过程具有重要价值。为了简化这一繁琐的过程,启动了一个在EST识别中采用自动系统发育分析的项目。

结果

galaxieEST是一个开源的Perl-CGI脚本包,旨在通过自动系统发育分析来补充传统的基于相似性的EST序列识别。它使用一系列BLAST运行作为筛选器,检索核苷酸和蛋白质序列以纳入邻接法和简约法分析;输出包括BLAST输出、系统发育分析结果以及相应的多序列比对。galaxieEST可作为在线网络服务用于识别真菌EST,也可在http://galaxie.cgb.ki.se/galaxieEST.html下载/本地安装以用于任何生物群体。

结论

通过除了相似性之外还考虑序列相关性,galaxieEST提供了关于EST起源和身份的综合视图,这在相似性搜索返回一个或多个相关但不完整的匹配且需要查询EST的额外信息的情况下可能特别有用。

相似文献

1
galaxieEST: addressing EST identity through automated phylogenetic analysis.
BMC Bioinformatics. 2004 Jul 5;5:87. doi: 10.1186/1471-2105-5-87.
3
PhyloGena--a user-friendly system for automated phylogenetic annotation of unknown sequences.
Bioinformatics. 2007 Apr 1;23(7):793-801. doi: 10.1093/bioinformatics/btm016. Epub 2007 Mar 1.
5
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
7
ESTuber db: an online database for Tuber borchii EST sequences.
BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2105-8-S1-S13.
8
WebTraceMiner: a web service for processing and mining EST sequence trace files.
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W137-42. doi: 10.1093/nar/gkm299. Epub 2007 May 8.
10
galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.
Bioinformatics. 2004 Jun 12;20(9):1447-52. doi: 10.1093/bioinformatics/bth119. Epub 2004 Feb 19.

本文引用的文献

1
galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.
Bioinformatics. 2004 Jun 12;20(9):1447-52. doi: 10.1093/bioinformatics/bth119. Epub 2004 Feb 19.
2
cDNA2Genome: a tool for mapping and annotating cDNAs.
BMC Bioinformatics. 2003 Sep 10;4:39. doi: 10.1186/1471-2105-4-39.
4
EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA.
Comput Appl Biosci. 1997 Aug;13(4):477-8. doi: 10.1093/bioinformatics/13.4.477.
5
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. doi: 10.1093/nar/25.17.3389.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验