Suppr超能文献

用于最佳匹配图编辑的启发式算法。

Heuristic algorithms for best match graph editing.

作者信息

Schaller David, Geiß Manuela, Hellmuth Marc, Stadler Peter F

机构信息

Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04109 Leipzig, Leipzig, Germany.

Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, D-04107, Leipzig, Germany.

出版信息

Algorithms Mol Biol. 2021 Aug 17;16(1):19. doi: 10.1186/s13015-021-00196-3.

Abstract

BACKGROUND

Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data.

RESULTS

Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho's supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing.

CONCLUSION

Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.

摘要

背景

最佳匹配图(BMGs)是一类彩色有向图,在数学系统发育学中自然出现,用于表示多个物种中两两关系最密切的基因。只要基因x的系统发育最近亲属之一是来自另一个物种(顶点颜色)Y的基因y,就会有一条弧将基因x与基因y相连。借助基因序列之间的相似性度量可以近似得到BMGs,尽管并非没有误差。因此,经验估计通常会违反BMGs的理论属性。相应的图编辑问题可用于指导最佳匹配数据的纠错。由于BMGs的弧集修改问题是NP完全问题,若要将BMGs用于生物序列数据的实际分析,就需要高效的启发式算法。

结果

由于BMGs可以根据在基因集上定义的某组有根三元组(三个顶点的二叉树)的一致性来表征,我们考虑对三元组集进行操作的启发式算法。作为一种替代方法,我们表明它与一个集合划分问题有密切联系,这导致了一类自顶向下的递归算法,这些算法类似于Aho的超级树算法,并产生了在使BMGs不变的意义上保持一致的BMG编辑算法。广泛的基准测试表明,用于划分步骤的社区检测算法在BMG编辑方面表现最佳。

结论

有噪声的BMG数据可以以足够的准确性和效率进行校正,从而使BMGs成为经典系统发育方法的有吸引力的替代方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98ca/8369769/9d0f41401a0b/13015_2021_196_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验