Suppr超能文献

稀疏邻接法:使用稀疏距离矩阵进行快速系统发育推断。

Sparse Neighbor Joining: rapid phylogenetic inference using a sparse distance matrix.

作者信息

Kurt Semih, Bouchard-Côté Alexandre, Lagergren Jens

机构信息

School of EECS and SciLifeLab, KTH Royal Institute of Technology, Stockholm, 100 44, Sweden.

Department of Statistics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.

出版信息

Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae701.

Abstract

MOTIVATION

Phylogenetic reconstruction is a fundamental problem in computational biology. The Neighbor Joining (NJ) algorithm offers an efficient distance-based solution to this problem, which often serves as the foundation for more advanced statistical methods. Despite prior efforts to enhance the speed of NJ, the computation of the n2 entries of the distance matrix, where n is the number of phylogenetic tree leaves, continues to pose a limitation in scaling NJ to larger datasets.

RESULTS

In this work, we propose a new algorithm which does not require computing a dense distance matrix. Instead, it dynamically determines a sparse set of at most O(n log n) distance matrix entries to be computed in its basic version, and up to O(n log 2n) entries in an enhanced version. We show by experiments that this approach reduces the execution time of NJ for large datasets, with a trade-off in accuracy.

AVAILABILITY AND IMPLEMENTATION

Sparse Neighbor Joining is implemented in Python and freely available at https://github.com/kurtsemih/SNJ.

摘要

动机

系统发育重建是计算生物学中的一个基本问题。邻接法(NJ)算法为该问题提供了一种基于距离的有效解决方案,它常常作为更先进统计方法的基础。尽管之前努力提高NJ的速度,但距离矩阵中n²个条目的计算(其中n是系统发育树叶子的数量)仍然限制了将NJ扩展到更大的数据集。

结果

在这项工作中,我们提出了一种新算法,该算法不需要计算密集的距离矩阵。相反,它在其基本版本中动态确定最多O(n log n)个距离矩阵条目的稀疏集,在增强版本中最多确定O(n log 2n)个条目。我们通过实验表明,这种方法减少了NJ在处理大型数据集时的执行时间,但在准确性上有所权衡。

可用性和实现

稀疏邻接法用Python实现,可在https://github.com/kurtsemih/SNJ上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a2/11637600/7aaeb0fd200f/btae701f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验