基于结构域相似性的直系同源物检测。

Domain similarity based orthology detection.

作者信息

Bitard-Feildel Tristan, Kemena Carsten, Greenwood Jenny M, Bornberg-Bauer Erich

机构信息

Institute for Evolution and Biodiversity, University of Münster, Hüfferstr. 1, Münster, Germany.

出版信息

BMC Bioinformatics. 2015 May 13;16:154. doi: 10.1186/s12859-015-0570-8.

DOI:10.1186/s12859-015-0570-8

PMID:25968113

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4443542/

Abstract

BACKGROUND

Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins.

RESULTS

We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison.

CONCLUSION

We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda .

摘要

背景

直系同源蛋白检测软件大多使用氨基酸序列的成对比较来确定两个蛋白是否为直系同源。因此，当用于比较的序列数量增加时，需要计算的比较次数呈二次方增长。生物信息学研究当前面临的一个挑战，尤其是考虑到可用测序生物数量不断增加的情况，是要在合理的时间内使这一不断增长的比较次数在计算上可行。我们建议通过使用结构域串来表征蛋白质，以加速直系同源蛋白的检测。

结果

我们提出了两种新的蛋白质相似性度量方法，一种基于结构域内容相似性的余弦度量和一种最大权重匹配得分，以及名为porthoDom的新软件。将余弦度量和最大权重匹配相似性度量的质量与经过整理的数据集进行了比较。这些度量方法表明，结构域内容相似性能够将蛋白质正确地归类到它们的家族中。因此，在为proteinortho开发的包装器porthoDom中使用了余弦相似性度量。porthoDom在搜索直系同源物之前，利用结构域内容相似性度量将蛋白质分组。通过使用结构域而非氨基酸序列，搜索空间的缩小降低了全对全序列比较的计算复杂度。

结论

我们证明，将蛋白质表示为离散结构域串，即作为其唯一标识符的串联，能够极大地简化搜索空间。porthoDom具有加速直系同源性检测的优势，同时保持与proteinortho相似的准确度。porthoDom的实现使用Python和C++语言发布，可在GNU GPL许可3下从http://www.bornberglab.org/pages/porthoda获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaad/4443542/609f4166f570/12859_2015_570_Fig1_HTML.jpg

相似文献

BMC Bioinformatics. 2015 May 13;16:154. doi: 10.1186/s12859-015-0570-8.

Bioinformatics. 2014 Jan 15;30(2):274-81. doi: 10.1093/bioinformatics/btt379. Epub 2013 Jul 4.

ORCAN-a web-based meta-server for real-time detection and functional annotation of orthologs.ORCAN——一个用于直系同源基因实时检测和功能注释的基于网络的元服务器。

Bioinformatics. 2017 Apr 15;33(8):1224-1226. doi: 10.1093/bioinformatics/btw825.

morFeus: a web-based program to detect remotely conserved orthologs using symmetrical best hits and orthology network scoring.MorFeus：一个基于网络的程序，使用对称最佳匹配和同源网络评分来检测远程保守的直系同源物。

BMC Bioinformatics. 2014 Aug 6;15(1):263. doi: 10.1186/1471-2105-15-263.

ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust：基于扩展的图形方法改进蛋白质序列聚类

Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.

MDAT- Aligning multiple domain arrangements.MDAT - 对齐多个结构域排列

BMC Bioinformatics. 2015 Jan 28;16(1):19. doi: 10.1186/s12859-014-0442-7.

CDART: protein homology by domain architecture.CDART：基于结构域架构的蛋白质同源性

Genome Res. 2002 Oct;12(10):1619-23. doi: 10.1101/gr.278202.

Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

PHOG-BLAST--a new generation tool for fast similarity search of protein families.PHOG-BLAST——用于蛋白质家族快速相似性搜索的新一代工具。

BMC Evol Biol. 2006 Jun 22;6:51. doi: 10.1186/1471-2148-6-51.

Hieranoid: hierarchical orthology inference.Hieranoid：层次同源推断。

J Mol Biol. 2013 Jun 12;425(11):2072-2081. doi: 10.1016/j.jmb.2013.02.018. Epub 2013 Feb 26.

引用本文的文献

Domainoid: domain-oriented orthology inference.域型体：面向域的直系同源推断。

BMC Bioinformatics. 2019 Oct 28;20(1):523. doi: 10.1186/s12859-019-3137-2.

Prediction of Protein-Protein Interactions Based on Domain.基于结构域的蛋白质-蛋白质相互作用预测。

Comput Math Methods Med. 2019 Aug 21;2019:5238406. doi: 10.1155/2019/5238406. eCollection 2019.

Muscle differentiation induced up-regulation of calcium-related gene expression in quail myoblasts.肌肉分化诱导鹌鹑成肌细胞中钙相关基因表达上调。

Asian-Australas J Anim Sci. 2018 Sep;31(9):1507-1515. doi: 10.5713/ajas.18.0302. Epub 2018 May 31.

New Tools in Orthology Analysis: A Brief Review of Promising Perspectives.直系同源分析的新工具：对前景广阔的观点的简要综述

Front Genet. 2017 Oct 31;8:165. doi: 10.3389/fgene.2017.00165. eCollection 2017.

PhyloPro2.0: a database for the dynamic exploration of phylogenetically conserved proteins and their domain architectures across the Eukarya.PhyloPro2.0：一个用于动态探索真核生物中系统发育保守蛋白及其结构域架构的数据库。

Database (Oxford). 2016 Mar 15;2016. doi: 10.1093/database/baw013. Print 2016.

Inferring Orthologs: Open Questions and Perspectives.推断直系同源基因：未解决的问题与展望

Genomics Insights. 2016 Feb 25;9:17-28. doi: 10.4137/GEI.S37925. eCollection 2016.

本文引用的文献

eggNOG v4.0: nested orthology inference across 3686 organisms.eggNOG v4.0：跨越 3686 个生物体的嵌套同源推断。

Nucleic Acids Res. 2014 Jan;42(Database issue):D231-9. doi: 10.1093/nar/gkt1253. Epub 2013 Dec 1.

PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome.PhylomeDB v4：深入研究基因组的多种进化历史。

Nucleic Acids Res. 2014 Jan;42(Database issue):D897-902. doi: 10.1093/nar/gkt1177. Epub 2013 Nov 25.

DoMosaics: software for domain arrangement visualization and domain-centric analysis of proteins.DoMosaics：用于蛋白质结构域排布可视化和以结构域为中心的分析的软件。

Bioinformatics. 2014 Jan 15;30(2):282-3. doi: 10.1093/bioinformatics/btt640. Epub 2013 Nov 12.

Bioinformatics. 2014 Jan 15;30(2):274-81. doi: 10.1093/bioinformatics/btt379. Epub 2013 Jul 4.

Dynamics and adaptive benefits of modular protein evolution.蛋白质进化的动态和适应性优势。

Curr Opin Struct Biol. 2013 Jun;23(3):459-66. doi: 10.1016/j.sbi.2013.02.012. Epub 2013 Apr 3.

Protein domain recurrence and order can enhance prediction of protein functions.蛋白质结构域的重复和顺序可以增强对蛋白质功能的预测。

Bioinformatics. 2012 Sep 15;28(18):i444-i450. doi: 10.1093/bioinformatics/bts398.

Dynamics and adaptive benefits of protein domain emergence and arrangements during plant genome evolution.植物基因组进化过程中蛋白质结构域的形成和排列的动态及适应优势。

Genome Biol Evol. 2012;4(3):316-29. doi: 10.1093/gbe/evs004. Epub 2012 Jan 16.

The Pfam protein families database.Pfam 蛋白质家族数据库。

Nucleic Acids Res. 2012 Jan;40(Database issue):D290-301. doi: 10.1093/nar/gkr1065. Epub 2011 Nov 29.

Accelerated Profile HMM Searches.加速轮廓隐马尔可夫模型搜索。

PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20.

Orthology prediction methods: a quality assessment using curated protein families.同源基因预测方法：基于已验证蛋白质家族的质量评估

Bioessays. 2011 Oct;33(10):769-80. doi: 10.1002/bies.201100062. Epub 2011 Aug 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于结构域相似性的直系同源物检测。

Domain similarity based orthology detection.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献