Sharma Sudip, Kumar Sudhir
Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.
Department of Biology, Temple University, Philadelphia, PA.
Nat Comput Sci. 2021 Sep;1(9):573-577. doi: 10.1038/s43588-021-00129-5. Epub 2021 Sep 22.
Felsenstein's bootstrap approach is widely used to assess confidence in species relationships inferred from multiple sequence alignments. It resamples sites randomly with replacement to build alignment replicates of the same size as the original alignment and infers a phylogeny from each replicate dataset. The proportion of phylogenies recovering the same grouping of species is its bootstrap confidence limit. But, standard bootstrap imposes a high computational burden in applications involving long sequence alignments. Here, we introduce the bag of little bootstraps approach to phylogenetics, bootstrapping only a few little samples, each containing a small subset of sites. We report that the median bagging of bootstrap confidence limits from little samples produces confidence in inferred species relationships similar to standard bootstrap but in a fraction of computational time and memory. Therefore, the little bootstraps approach can potentially enhance the rigor, efficiency, and parallelization of big data phylogenomic analyses.
费尔森斯坦的自展法被广泛用于评估从多序列比对推断出的物种关系的置信度。它通过有放回地随机重采样位点,构建与原始比对大小相同的比对重复样本,并从每个重复数据集中推断出系统发育树。恢复相同物种分组的系统发育树的比例就是其自展置信限。但是,在涉及长序列比对的应用中,标准自展法会带来很高的计算负担。在这里,我们引入了小自展包方法用于系统发育分析,只对少数小样本进行自展,每个小样本包含一小部分位点。我们报告称,从小样本中进行自展置信限的中位数装袋法产生的对推断物种关系的置信度与标准自展法相似,但计算时间和内存仅为其一小部分。因此,小自展法有可能提高大数据系统发育基因组分析的严谨性、效率和并行化程度。