Wright Robyn J, Langille Morgan G I
Department of Pharmacology, Faculty of Medicine, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada.
Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf269.
PICRUSt2 is a bioinformatic tool that predicts microbial functions in amplicon sequencing data using a database of annotated reference genomes. We have constructed an updated database for PICRUSt2 that has substantially increased the number of bacterial (19,493 to 26,868) and archaeal (406 to 1,002) genomes as well as the number of functional annotations present. The previous PICRUSt2 database relied on many timely and computationally intensive manual processes that made it difficult to update. We constructed a new streamlined process to allow regular upgrades to the PICRUSt2 database on an ongoing basis, and used this process to create a new database, PICRUSt2-SC (Sugar-Coated). Additionally, we have shown that this updated database contains genomes that more closely match study sequences from a range of different environments. The genomes contained in the database therefore better represent these environments and this leads to an improvement in the predicted functional annotations obtained from PICRUSt2.
PICRUSt2 source code is freely available at https://github.com/picrust/picrust2 and at https://anaconda.org/bioconda/picrust2. The latest version of PICRUSt2 at the time of writing is also archived: https://doi.org/10.5281/zenodo.15119781. The PICRUSt2-SC database comes pre-installed with PICRUSt2 from version 2.6.0 onwards. Step-by-step instructions for making the updated database are at https://github.com/picrust/picrust2/wiki/Updating-the-PICRUSt2-database. All code used for the analyses and figures in this manuscript is at https://github.com/R-Wright-1/PICRUSt2-SC_application_note and https://doi.org/10.5281/zenodo.15119770.
PICRUSt2是一种生物信息学工具,它使用带注释的参考基因组数据库来预测扩增子测序数据中的微生物功能。我们为PICRUSt2构建了一个更新的数据库,该数据库大幅增加了细菌基因组数量(从19493个增加到26868个)和古菌基因组数量(从406个增加到1002个),以及功能注释的数量。先前的PICRUSt2数据库依赖于许多耗时且计算密集的手动流程,这使得更新变得困难。我们构建了一个新的简化流程,以便能够持续定期升级PICRUSt2数据库,并使用此流程创建了一个新的数据库PICRUSt2-SC(糖衣版)。此外,我们已经表明,这个更新后的数据库包含与来自一系列不同环境的研究序列更匹配的基因组。因此,数据库中包含的基因组能更好地代表这些环境,这使得从PICRUSt2获得的预测功能注释得到了改进。