Suppr超能文献

使用多层最大聚类系数的同源性检测

Homology Detection Using Multilayer Maximum Clustering Coefficient.

作者信息

Santiago Caio, Pereira Vivian, Digiampietri Luciano

机构信息

1 Bioinformatics, University of São Paulo , São Paulo, Brazil .

2 School of Arts, Sciences and Humanities, University of São Paulo , São Paulo, Brazil .

出版信息

J Comput Biol. 2018 Aug 13. doi: 10.1089/cmb.2017.0266.

Abstract

Homologous sequences are widely used to understand the functions of certain genes or proteins. However, there is no consensus to solve the automatic assignment of functions to protein problem and many algorithms have different ways of identifying homologous clusters in a given set of sequences. In this article, we present an algorithm to deal with specific sets, the set of coding sequences obtained from phylogenetically close genomes (of the same species, genus, or family). When modeled as a graph, these sets have their own characteristics: they form more homogeneous and denser clusters. To solve this problem, our algorithm makes use of the clustering coefficient, which maximization can lead to the expected results from the biological point of view. In addition, we also present an algorithm for the identification of sequence domains based on graph topology. We also compared our results with those of the TribeMCL tool, a well-established algorithm of the area.

摘要

同源序列被广泛用于理解某些基因或蛋白质的功能。然而,对于解决蛋白质功能的自动分配问题尚无共识,许多算法在给定的序列集中识别同源簇的方式各不相同。在本文中,我们提出了一种处理特定集合的算法,该集合是从系统发育关系密切的基因组(同一物种、属或科)中获得的编码序列集。当建模为图时,这些集合具有自身的特点:它们形成更均匀、更密集的簇。为了解决这个问题,我们的算法利用了聚类系数,从生物学角度来看,最大化聚类系数可以得到预期的结果。此外,我们还提出了一种基于图拓扑结构识别序列结构域的算法。我们还将我们的结果与该领域成熟的算法TribeMCL工具的结果进行了比较。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验