• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过考虑子序列同源性来提高所有对抗所有蛋白质比较的速度,同时保持敏感性。

Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology.

机构信息

University College London, London, United Kingdom.

Swiss Institute of Bioinformatics, Zurich, Switzerland.

出版信息

PeerJ. 2014 Oct 7;2:e607. doi: 10.7717/peerj.607. eCollection 2014.

DOI:10.7717/peerj.607
PMID:25320677
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4193403/
Abstract

Orthology inference and other sequence analyses across multiple genomes typically start by performing exhaustive pairwise sequence comparisons, a process referred to as "all-against-all". As this process scales quadratically in terms of the number of sequences analysed, this step can become a bottleneck, thus limiting the number of genomes that can be simultaneously analysed. Here, we explored ways of speeding-up the all-against-all step while maintaining its sensitivity. By exploiting the transitivity of homology and, crucially, ensuring that homology is defined in terms of consistent protein subsequences, our proof-of-concept resulted in a 4× speedup while recovering >99.6% of all homologs identified by the full all-against-all procedure on empirical sequences sets. In comparison, state-of-the-art k-mer approaches are orders of magnitude faster but only recover 3-14% of all homologous pairs. We also outline ideas to further improve the speed and recall of the new approach. An open source implementation is provided as part of the OMA standalone software at http://omabrowser.org/standalone.

摘要

在多个基因组中进行直系同源推断和其他序列分析通常首先执行详尽的两两序列比较,这一过程称为“全对全”。由于该过程在分析的序列数量方面呈二次方扩展,因此这一步骤可能成为瓶颈,从而限制了可以同时分析的基因组数量。在这里,我们探索了在保持其敏感性的同时加快全对全步骤的方法。通过利用同源性的传递性,并且至关重要的是,确保同源性是根据一致的蛋白质子序列定义的,我们的概念验证在经验序列集上以 4 倍的速度提高了速度,同时恢复了全对全过程识别的所有同源物的>99.6%。相比之下,最先进的 k-mer 方法快几个数量级,但仅恢复所有同源对的 3-14%。我们还概述了进一步提高新方法速度和召回率的想法。作为 OMA 独立软件的一部分,在 http://omabrowser.org/standalone 上提供了开源实现。

相似文献

1
Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology.通过考虑子序列同源性来提高所有对抗所有蛋白质比较的速度,同时保持敏感性。
PeerJ. 2014 Oct 7;2:e607. doi: 10.7717/peerj.607. eCollection 2014.
2
Orthologous Matrix (OMA) algorithm 2.0: more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference.直系同源矩阵(OMA)算法2.0:对不对称进化速率更具鲁棒性,且在分层直系同源组推断方面更具扩展性。
Bioinformatics. 2017 Jul 15;33(14):i75-i82. doi: 10.1093/bioinformatics/btx229.
3
Inferring hierarchical orthologous groups from orthologous gene pairs.从直系同源基因对推断层次同源物组。
PLoS One. 2013;8(1):e53786. doi: 10.1371/journal.pone.0053786. Epub 2013 Jan 14.
4
OMA standalone: orthology inference among public and custom genomes and transcriptomes.OMA 独立版:公共和定制基因组和转录组之间的同源推断。
Genome Res. 2019 Jul;29(7):1152-1163. doi: 10.1101/gr.243212.118. Epub 2019 Jun 24.
5
Efficient inference of homologs in large eukaryotic pan-proteomes.在大型真核泛蛋白组中进行同源物的有效推断。
BMC Bioinformatics. 2018 Sep 26;19(1):340. doi: 10.1186/s12859-018-2362-4.
6
Accelerating the Smith-Waterman algorithm with interpair pruning and band optimization for the all-pairs comparison of base sequences.通过配对间剪枝和带优化加速史密斯-沃特曼算法以进行碱基序列的全配对比较。
BMC Bioinformatics. 2015 Oct 6;16:321. doi: 10.1186/s12859-015-0744-4.
7
Improved orthology inference with Hieranoid 2.使用Hieranoid 2改进直系同源推断。
Bioinformatics. 2017 Apr 15;33(8):1154-1159. doi: 10.1093/bioinformatics/btw774.
8
MACHOS: Markov clusters of homologous subsequences.MACHOS:同源子序列的马尔可夫聚类
Bioinformatics. 2008 Jul 1;24(13):i77-85. doi: 10.1093/bioinformatics/btn144.
9
OMA Browser--exploring orthologous relations across 352 complete genomes.OMA浏览器——探索352个完整基因组间的直系同源关系。
Bioinformatics. 2007 Aug 15;23(16):2180-2. doi: 10.1093/bioinformatics/btm295. Epub 2007 Jun 1.
10
Orthology inference in nonmodel organisms using transcriptomes and low-coverage genomes: improving accuracy and matrix occupancy for phylogenomics.利用转录组和低覆盖度基因组在非模式生物中进行直系同源基因推断:提高系统发育基因组学的准确性和矩阵占有率
Mol Biol Evol. 2014 Nov;31(11):3081-92. doi: 10.1093/molbev/msu245. Epub 2014 Aug 25.

引用本文的文献

1
OMA standalone: orthology inference among public and custom genomes and transcriptomes.OMA 独立版:公共和定制基因组和转录组之间的同源推断。
Genome Res. 2019 Jul;29(7):1152-1163. doi: 10.1101/gr.243212.118. Epub 2019 Jun 24.
2
The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements.2015年的OMA直系同源数据库:功能预测、对植物的更好支持、共线性视图及其他改进
Nucleic Acids Res. 2015 Jan;43(Database issue):D240-9. doi: 10.1093/nar/gku1158. Epub 2014 Nov 15.

本文引用的文献

1
Big data and other challenges in the quest for orthologs.大数据和其他挑战在寻找直系同源基因的过程中。
Bioinformatics. 2014 Nov 1;30(21):2993-8. doi: 10.1093/bioinformatics/btu492. Epub 2014 Jul 26.
2
eggNOG v4.0: nested orthology inference across 3686 organisms.eggNOG v4.0:跨越 3686 个生物体的嵌套同源推断。
Nucleic Acids Res. 2014 Jan;42(Database issue):D231-9. doi: 10.1093/nar/gkt1253. Epub 2013 Dec 1.
3
Pfam: the protein families database.Pfam:蛋白质家族数据库。
Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.
4
kClust: fast and sensitive clustering of large protein sequence databases.kClust:快速且灵敏的大规模蛋白质序列数据库聚类程序。
BMC Bioinformatics. 2013 Aug 15;14:248. doi: 10.1186/1471-2105-14-248.
5
Functional and evolutionary implications of gene orthology.基因直系同源的功能和进化意义。
Nat Rev Genet. 2013 May;14(5):360-6. doi: 10.1038/nrg3456. Epub 2013 Apr 4.
6
Hieranoid: hierarchical orthology inference.Hieranoid:层次同源推断。
J Mol Biol. 2013 Jun 12;425(11):2072-2081. doi: 10.1016/j.jmb.2013.02.018. Epub 2013 Feb 26.
7
Inferring hierarchical orthologous groups from orthologous gene pairs.从直系同源基因对推断层次同源物组。
PLoS One. 2013;8(1):e53786. doi: 10.1371/journal.pone.0053786. Epub 2013 Jan 14.
8
PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees.2013 年的 PANTHER:在系统发生树的背景下,对基因功能和其他基因属性的进化进行建模。
Nucleic Acids Res. 2013 Jan;41(Database issue):D377-86. doi: 10.1093/nar/gks1118. Epub 2012 Nov 27.
9
OrthoDB: a hierarchical catalog of animal, fungal and bacterial orthologs.OrthoDB:动物、真菌和细菌直系同源物的层次目录。
Nucleic Acids Res. 2013 Jan;41(Database issue):D358-65. doi: 10.1093/nar/gks1116. Epub 2012 Nov 24.
10
Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs.解决直系同源推断问题:直系同源物在功能上往往更相似,但程度较弱,显著强于旁系同源物。
PLoS Comput Biol. 2012;8(5):e1002514. doi: 10.1371/journal.pcbi.1002514. Epub 2012 May 17.