Suppr
超能文献

OrthoSelect：一种在系统发育基因组学中选择直系同源组的方案。

OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.

作者信息

Schreiber Fabian, Pick Kerstin, Erpenbeck Dirk, Wörheide Gert, Morgenstern Burkhard

机构信息

Abteilung Bioinformatik, Institut für Mikrobiologie und Genetik, Georg-August-Universität Göttingen, Göttingen, Germany.

出版信息

BMC Bioinformatics. 2009 Jul 16;10:219. doi: 10.1186/1471-2105-10-219.

DOI:10.1186/1471-2105-10-219

PMID:19607672

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2719630/

Abstract

BACKGROUND

Phylogenetic studies using expressed sequence tags (EST) are becoming a standard approach to answer evolutionary questions. Such studies are usually based on large sets of newly generated, unannotated, and error-prone EST sequences from different species. A first crucial step in EST-based phylogeny reconstruction is to identify groups of orthologous sequences. From these data sets, appropriate target genes are selected, and redundant sequences are eliminated to obtain suitable sequence sets as input data for tree-reconstruction software. Generating such data sets manually can be very time consuming. Thus, software tools are needed that carry out these steps automatically.

RESULTS

We developed a flexible and user-friendly software pipeline, running on desktop machines or computer clusters, that constructs data sets for phylogenomic analyses. It automatically searches assembled EST sequences against databases of orthologous groups (OG), assigns ESTs to these predefined OGs, translates the sequences into proteins, eliminates redundant sequences assigned to the same OG, creates multiple sequence alignments of identified orthologous sequences and offers the possibility to further process this alignment in a last step by excluding potentially homoplastic sites and selecting sufficiently conserved parts. Our software pipeline can be used as it is, but it can also be adapted by integrating additional external programs. This makes the pipeline useful for non-bioinformaticians as well as to bioinformatic experts. The software pipeline is especially designed for ESTs, but it can also handle protein sequences.

CONCLUSION

OrthoSelect is a tool that produces orthologous gene alignments from assembled ESTs. Our tests show that OrthoSelect detects orthologs in EST libraries with high accuracy. In the absence of a gold standard for orthology prediction, we compared predictions by OrthoSelect to a manually created and published phylogenomic data set. Our tool was not only able to rebuild the data set with a specificity of 98%, but it detected four percent more orthologous sequences. Furthermore, the results OrthoSelect produces are in absolut agreement with the results of other programs, but our tool offers a significant speedup and additional functionality, e.g. handling of ESTs, computing sequence alignments, and refining them. To our knowledge, there is currently no fully automated and freely available tool for this purpose. Thus, OrthoSelect is a valuable tool for researchers in the field of phylogenomics who deal with large quantities of EST sequences. OrthoSelect is written in Perl and runs on Linux/Mac OS X. The tool can be downloaded at (http://gobics.de/fabian/orthoselect.php).

摘要

背景

利用表达序列标签（EST）进行系统发育研究正逐渐成为解答进化问题的标准方法。此类研究通常基于来自不同物种的大量新生成的、未注释且容易出错的EST序列。基于EST的系统发育重建的关键第一步是识别直系同源序列组。从这些数据集中，选择合适的目标基因，并去除冗余序列，以获得合适的序列集作为树重建软件的输入数据。手动生成此类数据集可能非常耗时。因此，需要能自动执行这些步骤的软件工具。

结果

我们开发了一种灵活且用户友好的软件管道，可在桌面计算机或计算机集群上运行，用于构建系统发育基因组分析的数据集。它会自动针对直系同源组（OG）数据库搜索组装好的EST序列，将EST分配到这些预定义的OG中，将序列翻译成蛋白质，去除分配到同一OG的冗余序列，创建已识别直系同源序列的多序列比对，并提供在最后一步通过排除潜在的平行位点和选择足够保守的部分来进一步处理此比对的可能性。我们的软件管道可以直接使用，也可以通过集成其他外部程序进行调整。这使得该管道对非生物信息学家以及生物信息学专家都很有用。该软件管道专门为EST设计，但也可以处理蛋白质序列。

结论

OrthoSelect是一种从组装好的EST中生成直系同源基因比对的工具。我们的测试表明，OrthoSelect能高精度地检测EST文库中的直系同源物。在缺乏直系同源预测的金标准的情况下，我们将OrthoSelect的预测结果与一个手动创建并已发表的系统发育基因组数据集进行了比较。我们的工具不仅能够以98%的特异性重建该数据集，而且还检测到了多4%的直系同源序列。此外，OrthoSelect产生的结果与其他程序的结果完全一致，但我们的工具显著加快了速度并提供了额外功能，例如处理EST、计算序列比对并对其进行优化。据我们所知，目前尚无用于此目的的完全自动化且免费可用的工具。因此，OrthoSelect对于处理大量EST序列的系统发育基因组学领域的研究人员来说是一个有价值的工具。OrthoSelect用Perl编写，可在Linux/Mac OS X上运行。该工具可从（http://gobics.de/fabian/orthoselect.php）下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a82/2719630/1bf174c0f355/1471-2105-10-219-1.jpg

相似文献

OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.

BMC Bioinformatics. 2009 Jul 16;10:219. doi: 10.1186/1471-2105-10-219.

OrthoSelect: a web server for selecting orthologous gene alignments from EST sequences.

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W185-8. doi: 10.1093/nar/gkp434. Epub 2009 Jun 2.

JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow.

BMC Bioinformatics. 2006 Nov 23;7:513. doi: 10.1186/1471-2105-7-513.

ESAP plus: a web-based server for EST-SSR marker development.

BMC Genomics. 2016 Dec 22;17(Suppl 13):1035. doi: 10.1186/s12864-016-3328-4.

PhyloGena--a user-friendly system for automated phylogenetic annotation of unknown sequences.

Bioinformatics. 2007 Apr 1;23(7):793-801. doi: 10.1093/bioinformatics/btm016. Epub 2007 Mar 1.

galaxieEST: addressing EST identity through automated phylogenetic analysis.

BMC Bioinformatics. 2004 Jul 5;5:87. doi: 10.1186/1471-2105-5-87.

ESTree db: a tool for peach functional genomics.

BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S16. doi: 10.1186/1471-2105-6-S4-S16.

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.

BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.

Pairagon+N-SCAN_EST: a model-based gene annotation pipeline.

Genome Biol. 2006;7 Suppl 1(Suppl 1):S5.1-10. doi: 10.1186/gb-2006-7-s1-s5. Epub 2006 Aug 7.

Phylogenomic analysis of EST datasets.

Methods Mol Biol. 2009;533:257-76. doi: 10.1007/978-1-60327-136-3_12.

引用本文的文献

Orthograph: a versatile tool for mapping coding nucleotide sequences to clusters of orthologous genes.

BMC Bioinformatics. 2017 Feb 16;18(1):111. doi: 10.1186/s12859-017-1529-8.

Fast and accurate phylogeny reconstruction using filtered spaced-word matches.

Bioinformatics. 2017 Apr 1;33(7):971-979. doi: 10.1093/bioinformatics/btw776.

A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes.

Genome Biol. 2016 Nov 17;17(1):232. doi: 10.1186/s13059-016-1094-x.

Orthology inference in nonmodel organisms using transcriptomes and low-coverage genomes: improving accuracy and matrix occupancy for phylogenomics.

Mol Biol Evol. 2014 Nov;31(11):3081-92. doi: 10.1093/molbev/msu245. Epub 2014 Aug 25.

Fast alignment-free sequence comparison using spaced-word frequencies.

Bioinformatics. 2014 Jul 15;30(14):1991-9. doi: 10.1093/bioinformatics/btu177. Epub 2014 Apr 3.

Bioinformatics analysis of large-scale viral sequences: from construction of data sets to annotation of a phylogenetic tree.

Virulence. 2013 Jan 1;4(1):97-106. doi: 10.4161/viru.23161.

Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer.

Biol Direct. 2012 Dec 14;7:46. doi: 10.1186/1745-6150-7-46.

Insect phylogenomics: exploring the source of incongruence using new transcriptomic data.

Genome Biol Evol. 2012;4(12):1295-309. doi: 10.1093/gbe/evs104.

Integrating multi-origin expression data improves the resolution of deep phylogeny of ray-finned fish (Actinopterygii).

Sci Rep. 2012;2:665. doi: 10.1038/srep00665. Epub 2012 Sep 18.

Basal jawed vertebrate phylogenomics using transcriptomic data from Solexa sequencing.

PLoS One. 2012;7(4):e36256. doi: 10.1371/journal.pone.0036256. Epub 2012 Apr 27.

本文引用的文献

A Monte Carlo approach successfully identifies randomness in multiple sequence alignments: a more objective means of data exclusion.

Syst Biol. 2009 Feb;58(1):21-34. doi: 10.1093/sysbio/syp006. Epub 2009 May 20.

OrthoSelect: a web server for selecting orthologous gene alignments from EST sequences.

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W185-8. doi: 10.1093/nar/gkp434. Epub 2009 Jun 2.

Phylogenomics revives traditional views on deep animal relationships.

Curr Biol. 2009 Apr 28;19(8):706-12. doi: 10.1016/j.cub.2009.02.052. Epub 2009 Apr 2.

Noisy: identification of problematic columns in multiple sequence alignments.

Algorithms Mol Biol. 2008 Jun 24;3:7. doi: 10.1186/1748-7188-3-7.

A probabilistic model of local sequence alignment that simplifies statistical significance estimation.

PLoS Comput Biol. 2008 May 30;4(5):e1000069. doi: 10.1371/journal.pcbi.1000069.

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment.

Algorithms Mol Biol. 2008 May 27;3:6. doi: 10.1186/1748-7188-3-6.

Broad phylogenomic sampling improves resolution of the animal tree of life.

Nature. 2008 Apr 10;452(7188):745-9. doi: 10.1038/nature06614. Epub 2008 Mar 5.

TreeFam: 2008 Update.

Nucleic Acids Res. 2008 Jan;36(Database issue):D735-40. doi: 10.1093/nar/gkm1005. Epub 2007 Dec 1.

Orthology and functional conservation in eukaryotes.

Annu Rev Genet. 2007;41:465-507. doi: 10.1146/annurev.genet.40.110405.090439.

BLASTO: a tool for searching orthologous groups.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W678-82. doi: 10.1093/nar/gkm278. Epub 2007 May 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

OrthoSelect：一种在系统发育基因组学中选择直系同源组的方案。

OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译