Käther Karl K, Remmel Andreas, Lemke Steffen, Stadler Peter F
Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Härtelstrasse 16-18, D-04017, Leipzig, Germany.
Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA.
Algorithms Mol Biol. 2025 Apr 5;20(1):5. doi: 10.1186/s13015-025-00275-9.
Orthology inference lies at the foundation of comparative genomics research. The correct identification of loci which descended from a common ancestral sequence is not only complicated by sequence divergence but also duplication and other genome rearrangements. The conservation of gene order, i.e. synteny, is used in conjunction with sequence similarity as an additional factor for orthology determination. Current approaches, however, rely on genome annotations and are therefore limited. Here we present an annotation-free approach and compare it to synteny analysis with annotations. We find that our approach works better in closely related genomes whereas there is a better performance with annotations for more distantly related genomes. Overall, the presented algorithm offers a useful alternative to annotation-based methods and can outperform them in many cases.
直系同源推断是比较基因组学研究的基础。正确识别源自共同祖先序列的基因座不仅因序列差异而复杂,还受到复制和其他基因组重排的影响。基因顺序的保守性,即共线性,与序列相似性一起用作确定直系同源性的附加因素。然而,目前的方法依赖于基因组注释,因此存在局限性。在这里,我们提出了一种无需注释的方法,并将其与有注释的共线性分析进行比较。我们发现我们的方法在亲缘关系较近的基因组中效果更好,而对于亲缘关系较远的基因组,有注释的方法表现更佳。总体而言,所提出的算法为基于注释的方法提供了一种有用的替代方案,并且在许多情况下可以超越它们。