Seo Chang Wan, Yoo Shinnam, Cho Yoonhee, Kim Ji Seon, Steinegger Martin, Lim Young Woon
School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea.
Institute of Biodiversity, Seoul National University, Seoul 08826, Republic of Korea.
J Microbiol. 2025 Apr;63(4):e2411017. doi: 10.71150/jm.2411017. Epub 2025 Apr 29.
The increase of sequence data in public nucleotide databases has made DNA sequence-based identification an indispensable tool for fungal identification. However, the large proportion of mislabeled sequence data in public databases leads to frequent misidentifications. Inaccurate identification is causing severe problems, especially for industrial and clinical fungi, and edible mushrooms. Existing species identification pipelines require separate validation of a dataset obtained from public databases containing mislabeled taxonomic identifications. To address this issue, we developed FunVIP, a fully automated phylogeny-based fungal validation and identification pipeline (https://github.com/Changwanseo/FunVIP). FunVIP employs phylogeny-based identification with validation, where the result is achievable only with a query, database, and a single command. FunVIP command comprises nine steps within a workflow: input management, sequence-set organization, alignment, trimming, concatenation, model selection, tree inference, tree interpretation, and report generation. Users may acquire identification results, phylogenetic tree evidence, and reports of conflicts and issues detected in multiple checkpoints during the analysis. The conflicting sample validation performance of FunVIP was demonstrated by re-iterating the manual revision of a fungal genus with a database with mislabeled sequences, Fuscoporia. We also compared the identification performance of FunVIP with BLAST and q2-feature-classifier with two mass double-revised fungal datasets, Sanghuangporus and Aspergillus section Terrei. Therefore, with its automatic validation ability and high identification performance, FunVIP proves to be a highly promising tool for achieving easy and accurate fungal identification.
公共核苷酸数据库中序列数据的增加使得基于DNA序列的鉴定成为真菌鉴定不可或缺的工具。然而,公共数据库中大量错误标记的序列数据导致频繁的错误鉴定。错误鉴定正引发严重问题,尤其是对于工业和临床真菌以及食用菌而言。现有的物种鉴定流程需要对从包含错误分类鉴定的公共数据库中获取的数据集进行单独验证。为了解决这个问题,我们开发了FunVIP,这是一个基于系统发育的全自动真菌验证和鉴定流程(https://github.com/Changwanseo/FunVIP)。FunVIP采用基于系统发育的鉴定并进行验证,只需一个查询、一个数据库和一条命令即可得出结果。FunVIP命令在一个工作流程中包含九个步骤:输入管理、序列集组织、比对、修剪、拼接、模型选择、树推断、树解读和报告生成。用户可以获得鉴定结果、系统发育树证据以及在分析过程中多个检查点检测到的冲突和问题报告。通过使用一个带有错误标记序列的数据库Fuscoporia对一个真菌属进行手动修订并反复操作,展示了FunVIP对冲突样本的验证性能。我们还将FunVIP与BLAST和q2-feature-classifier的鉴定性能进行了比较,使用了两个经过大量双重修订的真菌数据集,即桑黄属和曲霉属土曲霉组。因此,凭借其自动验证能力和高鉴定性能,FunVIP被证明是实现轻松准确的真菌鉴定的极具前景的工具。