用于可靠全基因组共线性检测的无偏锚定物。

Unbiased anchors for reliable genome-wide synteny detection.

作者信息

Käther Karl K, Remmel Andreas, Lemke Steffen, Stadler Peter F

机构信息

Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Härtelstrasse 16-18, D-04017, Leipzig, Germany.

Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA.

出版信息

Algorithms Mol Biol. 2025 Apr 5;20(1):5. doi: 10.1186/s13015-025-00275-9.

DOI:10.1186/s13015-025-00275-9

PMID:40188341

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11972476/

Abstract

Orthology inference lies at the foundation of comparative genomics research. The correct identification of loci which descended from a common ancestral sequence is not only complicated by sequence divergence but also duplication and other genome rearrangements. The conservation of gene order, i.e. synteny, is used in conjunction with sequence similarity as an additional factor for orthology determination. Current approaches, however, rely on genome annotations and are therefore limited. Here we present an annotation-free approach and compare it to synteny analysis with annotations. We find that our approach works better in closely related genomes whereas there is a better performance with annotations for more distantly related genomes. Overall, the presented algorithm offers a useful alternative to annotation-based methods and can outperform them in many cases.

摘要

直系同源推断是比较基因组学研究的基础。正确识别源自共同祖先序列的基因座不仅因序列差异而复杂，还受到复制和其他基因组重排的影响。基因顺序的保守性，即共线性，与序列相似性一起用作确定直系同源性的附加因素。然而，目前的方法依赖于基因组注释，因此存在局限性。在这里，我们提出了一种无需注释的方法，并将其与有注释的共线性分析进行比较。我们发现我们的方法在亲缘关系较近的基因组中效果更好，而对于亲缘关系较远的基因组，有注释的方法表现更佳。总体而言，所提出的算法为基于注释的方法提供了一种有用的替代方案，并且在许多情况下可以超越它们。