• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

OrthoSelect:一种在系统发育基因组学中选择直系同源组的方案。

OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.

作者信息

Schreiber Fabian, Pick Kerstin, Erpenbeck Dirk, Wörheide Gert, Morgenstern Burkhard

机构信息

Abteilung Bioinformatik, Institut für Mikrobiologie und Genetik, Georg-August-Universität Göttingen, Göttingen, Germany.

出版信息

BMC Bioinformatics. 2009 Jul 16;10:219. doi: 10.1186/1471-2105-10-219.

DOI:10.1186/1471-2105-10-219
PMID:19607672
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2719630/
Abstract

BACKGROUND

Phylogenetic studies using expressed sequence tags (EST) are becoming a standard approach to answer evolutionary questions. Such studies are usually based on large sets of newly generated, unannotated, and error-prone EST sequences from different species. A first crucial step in EST-based phylogeny reconstruction is to identify groups of orthologous sequences. From these data sets, appropriate target genes are selected, and redundant sequences are eliminated to obtain suitable sequence sets as input data for tree-reconstruction software. Generating such data sets manually can be very time consuming. Thus, software tools are needed that carry out these steps automatically.

RESULTS

We developed a flexible and user-friendly software pipeline, running on desktop machines or computer clusters, that constructs data sets for phylogenomic analyses. It automatically searches assembled EST sequences against databases of orthologous groups (OG), assigns ESTs to these predefined OGs, translates the sequences into proteins, eliminates redundant sequences assigned to the same OG, creates multiple sequence alignments of identified orthologous sequences and offers the possibility to further process this alignment in a last step by excluding potentially homoplastic sites and selecting sufficiently conserved parts. Our software pipeline can be used as it is, but it can also be adapted by integrating additional external programs. This makes the pipeline useful for non-bioinformaticians as well as to bioinformatic experts. The software pipeline is especially designed for ESTs, but it can also handle protein sequences.

CONCLUSION

OrthoSelect is a tool that produces orthologous gene alignments from assembled ESTs. Our tests show that OrthoSelect detects orthologs in EST libraries with high accuracy. In the absence of a gold standard for orthology prediction, we compared predictions by OrthoSelect to a manually created and published phylogenomic data set. Our tool was not only able to rebuild the data set with a specificity of 98%, but it detected four percent more orthologous sequences. Furthermore, the results OrthoSelect produces are in absolut agreement with the results of other programs, but our tool offers a significant speedup and additional functionality, e.g. handling of ESTs, computing sequence alignments, and refining them. To our knowledge, there is currently no fully automated and freely available tool for this purpose. Thus, OrthoSelect is a valuable tool for researchers in the field of phylogenomics who deal with large quantities of EST sequences. OrthoSelect is written in Perl and runs on Linux/Mac OS X. The tool can be downloaded at (http://gobics.de/fabian/orthoselect.php).

摘要

背景

利用表达序列标签(EST)进行系统发育研究正逐渐成为解答进化问题的标准方法。此类研究通常基于来自不同物种的大量新生成的、未注释且容易出错的EST序列。基于EST的系统发育重建的关键第一步是识别直系同源序列组。从这些数据集中,选择合适的目标基因,并去除冗余序列,以获得合适的序列集作为树重建软件的输入数据。手动生成此类数据集可能非常耗时。因此,需要能自动执行这些步骤的软件工具。

结果

我们开发了一种灵活且用户友好的软件管道,可在桌面计算机或计算机集群上运行,用于构建系统发育基因组分析的数据集。它会自动针对直系同源组(OG)数据库搜索组装好的EST序列,将EST分配到这些预定义的OG中,将序列翻译成蛋白质,去除分配到同一OG的冗余序列,创建已识别直系同源序列的多序列比对,并提供在最后一步通过排除潜在的平行位点和选择足够保守的部分来进一步处理此比对的可能性。我们的软件管道可以直接使用,也可以通过集成其他外部程序进行调整。这使得该管道对非生物信息学家以及生物信息学专家都很有用。该软件管道专门为EST设计,但也可以处理蛋白质序列。

结论

OrthoSelect是一种从组装好的EST中生成直系同源基因比对的工具。我们的测试表明,OrthoSelect能高精度地检测EST文库中的直系同源物。在缺乏直系同源预测的金标准的情况下,我们将OrthoSelect的预测结果与一个手动创建并已发表的系统发育基因组数据集进行了比较。我们的工具不仅能够以98%的特异性重建该数据集,而且还检测到了多4%的直系同源序列。此外,OrthoSelect产生的结果与其他程序的结果完全一致,但我们的工具显著加快了速度并提供了额外功能,例如处理EST、计算序列比对并对其进行优化。据我们所知,目前尚无用于此目的的完全自动化且免费可用的工具。因此,OrthoSelect对于处理大量EST序列的系统发育基因组学领域的研究人员来说是一个有价值的工具。OrthoSelect用Perl编写,可在Linux/Mac OS X上运行。该工具可从(http://gobics.de/fabian/orthoselect.php)下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a82/2719630/1954d39d376d/1471-2105-10-219-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a82/2719630/1bf174c0f355/1471-2105-10-219-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a82/2719630/a38ec727d176/1471-2105-10-219-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a82/2719630/8e53523f742d/1471-2105-10-219-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a82/2719630/4d5bc44878e7/1471-2105-10-219-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a82/2719630/1954d39d376d/1471-2105-10-219-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a82/2719630/1bf174c0f355/1471-2105-10-219-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a82/2719630/a38ec727d176/1471-2105-10-219-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a82/2719630/8e53523f742d/1471-2105-10-219-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a82/2719630/4d5bc44878e7/1471-2105-10-219-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a82/2719630/1954d39d376d/1471-2105-10-219-5.jpg

相似文献

1
OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.OrthoSelect:一种在系统发育基因组学中选择直系同源组的方案。
BMC Bioinformatics. 2009 Jul 16;10:219. doi: 10.1186/1471-2105-10-219.
2
OrthoSelect: a web server for selecting orthologous gene alignments from EST sequences.OrthoSelect:一个用于从EST序列中选择直系同源基因比对的网络服务器。
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W185-8. doi: 10.1093/nar/gkp434. Epub 2009 Jun 2.
3
JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow.JUICE:一个数据管理系统,可在EST项目工作流程中促进对大量信息的分析。
BMC Bioinformatics. 2006 Nov 23;7:513. doi: 10.1186/1471-2105-7-513.
4
ESAP plus: a web-based server for EST-SSR marker development.ESAP plus:一个用于EST-SSR标记开发的基于网络的服务器。
BMC Genomics. 2016 Dec 22;17(Suppl 13):1035. doi: 10.1186/s12864-016-3328-4.
5
PhyloGena--a user-friendly system for automated phylogenetic annotation of unknown sequences.PhyloGena——一个用于对未知序列进行自动系统发育注释的用户友好型系统。
Bioinformatics. 2007 Apr 1;23(7):793-801. doi: 10.1093/bioinformatics/btm016. Epub 2007 Mar 1.
6
galaxieEST: addressing EST identity through automated phylogenetic analysis.星系EST:通过自动系统发育分析确定EST身份
BMC Bioinformatics. 2004 Jul 5;5:87. doi: 10.1186/1471-2105-5-87.
7
ESTree db: a tool for peach functional genomics.ESTree数据库:一种用于桃功能基因组学的工具。
BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S16. doi: 10.1186/1471-2105-6-S4-S16.
8
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign:利用氨基酸促进蛋白质编码DNA序列的多重比对。
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
9
Pairagon+N-SCAN_EST: a model-based gene annotation pipeline.Pairagon+N-SCAN_EST:一种基于模型的基因注释流程。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S5.1-10. doi: 10.1186/gb-2006-7-s1-s5. Epub 2006 Aug 7.
10
Phylogenomic analysis of EST datasets.EST数据集的系统发育基因组分析。
Methods Mol Biol. 2009;533:257-76. doi: 10.1007/978-1-60327-136-3_12.

引用本文的文献

1
Orthograph: a versatile tool for mapping coding nucleotide sequences to clusters of orthologous genes.Orthograph:一种将编码核苷酸序列映射到直系同源基因簇的多功能工具。
BMC Bioinformatics. 2017 Feb 16;18(1):111. doi: 10.1186/s12859-017-1529-8.
2
Fast and accurate phylogeny reconstruction using filtered spaced-word matches.使用过滤后的间隔词匹配进行快速准确的系统发育重建。
Bioinformatics. 2017 Apr 1;33(7):971-979. doi: 10.1093/bioinformatics/btw776.
3
A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes.

本文引用的文献

1
A Monte Carlo approach successfully identifies randomness in multiple sequence alignments: a more objective means of data exclusion.一种蒙特卡罗方法成功地识别了多重序列比对中的随机性:一种更客观的数据排除方法。
Syst Biol. 2009 Feb;58(1):21-34. doi: 10.1093/sysbio/syp006. Epub 2009 May 20.
2
OrthoSelect: a web server for selecting orthologous gene alignments from EST sequences.OrthoSelect:一个用于从EST序列中选择直系同源基因比对的网络服务器。
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W185-8. doi: 10.1093/nar/gkp434. Epub 2009 Jun 2.
3
Phylogenomics revives traditional views on deep animal relationships.
一种基于密码子的新型德布鲁因图算法,用于从未组装转录组构建基因。
Genome Biol. 2016 Nov 17;17(1):232. doi: 10.1186/s13059-016-1094-x.
4
Orthology inference in nonmodel organisms using transcriptomes and low-coverage genomes: improving accuracy and matrix occupancy for phylogenomics.利用转录组和低覆盖度基因组在非模式生物中进行直系同源基因推断:提高系统发育基因组学的准确性和矩阵占有率
Mol Biol Evol. 2014 Nov;31(11):3081-92. doi: 10.1093/molbev/msu245. Epub 2014 Aug 25.
5
Fast alignment-free sequence comparison using spaced-word frequencies.基于空位词频的快速无比对序列比较。
Bioinformatics. 2014 Jul 15;30(14):1991-9. doi: 10.1093/bioinformatics/btu177. Epub 2014 Apr 3.
6
Bioinformatics analysis of large-scale viral sequences: from construction of data sets to annotation of a phylogenetic tree.生物信息学分析大规模病毒序列:从数据集的构建到系统发育树的注释。
Virulence. 2013 Jan 1;4(1):97-106. doi: 10.4161/viru.23161.
7
Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer.古菌的更新直系同源基因簇:古菌的复杂祖先和水平基因转移的旁路。
Biol Direct. 2012 Dec 14;7:46. doi: 10.1186/1745-6150-7-46.
8
Insect phylogenomics: exploring the source of incongruence using new transcriptomic data.昆虫系统基因组学:利用新转录组数据探索不一致的来源。
Genome Biol Evol. 2012;4(12):1295-309. doi: 10.1093/gbe/evs104.
9
Integrating multi-origin expression data improves the resolution of deep phylogeny of ray-finned fish (Actinopterygii).整合多起源表达数据可提高硬骨鱼(条鳍鱼)深系谱解析度。
Sci Rep. 2012;2:665. doi: 10.1038/srep00665. Epub 2012 Sep 18.
10
Basal jawed vertebrate phylogenomics using transcriptomic data from Solexa sequencing.基于 Solexa 测序转录组数据的基干有颌脊椎动物系统发生基因组学研究。
PLoS One. 2012;7(4):e36256. doi: 10.1371/journal.pone.0036256. Epub 2012 Apr 27.
系统发育基因组学复兴了关于动物深层关系的传统观点。
Curr Biol. 2009 Apr 28;19(8):706-12. doi: 10.1016/j.cub.2009.02.052. Epub 2009 Apr 2.
4
Noisy: identification of problematic columns in multiple sequence alignments.Noisy:识别多序列比对中有问题的列。
Algorithms Mol Biol. 2008 Jun 24;3:7. doi: 10.1186/1748-7188-3-7.
5
A probabilistic model of local sequence alignment that simplifies statistical significance estimation.一种简化统计显著性估计的局部序列比对概率模型。
PLoS Comput Biol. 2008 May 30;4(5):e1000069. doi: 10.1371/journal.pcbi.1000069.
6
DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment.DIALIGN-TX:基于片段的多序列比对的贪心与渐进方法。
Algorithms Mol Biol. 2008 May 27;3:6. doi: 10.1186/1748-7188-3-6.
7
Broad phylogenomic sampling improves resolution of the animal tree of life.广泛的系统发育基因组采样提高了动物生命树的分辨率。
Nature. 2008 Apr 10;452(7188):745-9. doi: 10.1038/nature06614. Epub 2008 Mar 5.
8
TreeFam: 2008 Update.树家族:2008年更新版
Nucleic Acids Res. 2008 Jan;36(Database issue):D735-40. doi: 10.1093/nar/gkm1005. Epub 2007 Dec 1.
9
Orthology and functional conservation in eukaryotes.真核生物中的直系同源性与功能保守性。
Annu Rev Genet. 2007;41:465-507. doi: 10.1146/annurev.genet.40.110405.090439.
10
BLASTO: a tool for searching orthologous groups.BLASTO:一种用于搜索直系同源组的工具。
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W678-82. doi: 10.1093/nar/gkm278. Epub 2007 May 5.