Suppr超能文献

基于结构域相似性的直系同源物检测。

Domain similarity based orthology detection.

作者信息

Bitard-Feildel Tristan, Kemena Carsten, Greenwood Jenny M, Bornberg-Bauer Erich

机构信息

Institute for Evolution and Biodiversity, University of Münster, Hüfferstr. 1, Münster, Germany.

出版信息

BMC Bioinformatics. 2015 May 13;16:154. doi: 10.1186/s12859-015-0570-8.

Abstract

BACKGROUND

Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins.

RESULTS

We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison.

CONCLUSION

We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda .

摘要

背景

直系同源蛋白检测软件大多使用氨基酸序列的成对比较来确定两个蛋白是否为直系同源。因此,当用于比较的序列数量增加时,需要计算的比较次数呈二次方增长。生物信息学研究当前面临的一个挑战,尤其是考虑到可用测序生物数量不断增加的情况,是要在合理的时间内使这一不断增长的比较次数在计算上可行。我们建议通过使用结构域串来表征蛋白质,以加速直系同源蛋白的检测。

结果

我们提出了两种新的蛋白质相似性度量方法,一种基于结构域内容相似性的余弦度量和一种最大权重匹配得分,以及名为porthoDom的新软件。将余弦度量和最大权重匹配相似性度量的质量与经过整理的数据集进行了比较。这些度量方法表明,结构域内容相似性能够将蛋白质正确地归类到它们的家族中。因此,在为proteinortho开发的包装器porthoDom中使用了余弦相似性度量。porthoDom在搜索直系同源物之前,利用结构域内容相似性度量将蛋白质分组。通过使用结构域而非氨基酸序列,搜索空间的缩小降低了全对全序列比较的计算复杂度。

结论

我们证明,将蛋白质表示为离散结构域串,即作为其唯一标识符的串联,能够极大地简化搜索空间。porthoDom具有加速直系同源性检测的优势,同时保持与proteinortho相似的准确度。porthoDom的实现使用Python和C++语言发布,可在GNU GPL许可3下从http://www.bornberglab.org/pages/porthoda获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaad/4443542/609f4166f570/12859_2015_570_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验