简而言之：在氨基酸或基因组序列上进行带有丰度过滤、多线程和自展法的系统发育基因组学。

SANS ambages: phylogenomics with abundance-filter, multi-threading, and bootstrapping on amino-acid or genomic sequences.

作者信息

Kolesch Fabian, Sohn Marco, Rempel Andreas, Hippel Pia, Wittler Roland

机构信息

Genome Informatics, Faculty of Technology and Center for Biotechnology, Bielefeld University, 33615, Bielefeld, Germany.

Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, 33615, Bielefeld, Germany.

出版信息

BMC Bioinformatics. 2025 Sep 2;26(1):227. doi: 10.1186/s12859-025-06204-2.

DOI:10.1186/s12859-025-06204-2

PMID:40898043

Abstract

BACKGROUND

The increasing amount of available genome sequence data enables large-scale comparative studies. A common task is the inference of phylogenies- a challenging task if close reference sequences are not available, genome sequences are incompletely assembled, or the high number of genomes precludes multiple sequence alignment in reasonable time. SANS is an alignment-free, whole-genome based approach for phylogeny estimation.

RESULTS

Here we present a new implementation SANS ambages with a significantly increased application spectrum. It offers additional types of input data, parallelized processing, and bootstrapping. The source code (C++), documentation, and example data are freely available for download at: https://github.com/gi-bielefeld/sans . SANS can also be launched via the web-interface of the CloWM platform- free of charge, with a standard Life Science account: https://clowm.bi.denbi.de/workflows/0194b78f-9696-7402-a2b8-858508733618/ .

CONCLUSIONS

The new version not only shortens processing time on large datasets immensely by parallelization. Being able to also process amino acid sequences and offering a filter for low-abundant DNA read segments also enables new application cases. Bootstrapping and integrated visualization ease and enrich the interpretation of the resulting phylogenies.

摘要

背景

可用基因组序列数据量的不断增加使得大规模比较研究成为可能。一个常见的任务是推断系统发育——如果没有相近的参考序列、基因组序列组装不完整，或者基因组数量众多以至于无法在合理时间内进行多序列比对，这将是一项具有挑战性的任务。SANS是一种基于全基因组的无比对系统发育估计方法。

结果

在此，我们展示了一种新的实现方式SANS ambages，其应用范围显著扩大。它提供了额外的输入数据类型、并行处理和自展检验。源代码（C++）、文档和示例数据可在以下网址免费下载：https://github.com/gi-bielefeld/sans 。SANS也可以通过CloWM平台的网络界面启动——使用标准生命科学账户免费使用：https://clowm.bi.denbi.de/workflows/0194b78f-9696-7402-a2b8-858508733618/ 。