National Identification Services, Plant Protection and Quarantine, Animal and Plant Health Inspection Service, U.S. Department of Agriculture, Beltsville, MD 20705.
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894.
Plant Dis. 2022 Jun;106(6):1573-1596. doi: 10.1094/PDIS-09-21-2083-SR. Epub 2022 May 10.
Publicly available and validated DNA reference sequences useful for phylogeny estimation and identification of fungal pathogens are an increasingly important resource in the efforts of plant protection organizations to facilitate safe international trade of agricultural commodities. species are among the most frequently encountered and regulated plant pathogens at U.S. ports-of-entry. The RefSeq Targeted Loci (RTL) project at NCBI (BioProject no. PRJNA177353) contains a database of curated fungal internal transcribed spacer (ITS) sequences that interact extensively with NCBI Taxonomy, resulting in verified name-strain-sequence type associations for >12,000 species. We present a publicly available dataset of verified and curated name-type strain-sequence associations for all available species. This includes an updated GenBank Taxonomy for 238 species associated with up to 11 protein coding loci and an updated RTL ITS dataset for 226 species. We demonstrate that several marker loci are well suited for phylogenetic inference and identification. We improve understanding of phylogenetic relationships among verified species, verify or improve phylogenetic circumscriptions of 14 species complexes, and reveal that determining relationships among these major clades will require additional data. We present detailed comparisons between phylogenetic and similarity-based approaches to species identification, revealing complex patterns among single marker loci that often lead to misidentification when based on single-locus similarity approaches. We also demonstrate that species-level identification is elusive for a subset of samples regardless of analytical approach, which may be explained by novel species diversity in our dataset and incomplete lineage sorting and lack of accumulated synapomorphies at these loci.
可公开获取和经过验证的 DNA 参考序列对于系统发育估计和鉴定真菌病原体非常重要,这是植物保护组织为促进农业商品的安全国际贸易而努力的重要资源。在进入美国港口的最常遇到和监管的植物病原体中, 种是其中之一。NCBI 的 RefSeq 靶向基因座(RTL)项目(BioProject 编号 PRJNA177353)包含一个经过精心整理的真菌内部转录间隔区(ITS)序列数据库,该数据库与 NCBI 分类学广泛交互,为 >12,000 个物种生成了经过验证的名称-菌株-序列类型关联。我们提供了一个公开可用的数据集,其中包含所有可用 物种的经过验证和精心整理的名称-类型-菌株-序列关联。这包括与多达 11 个蛋白质编码基因座相关的 238 个物种的更新 GenBank 分类学,以及 226 个物种的更新 RTL ITS 数据集。我们证明了几个标记基因座非常适合系统发育推断和鉴定。我们加深了对已验证物种之间系统发育关系的理解,验证或改进了 14 个种复合体的系统发育范围,并揭示了确定这些主要分支之间的关系将需要额外的数据。我们详细比较了基于系统发育和基于相似性的物种鉴定方法,揭示了单个标记基因座之间复杂的模式,这些模式通常会导致基于单一基因座相似性方法的错误鉴定。我们还证明,无论分析方法如何,对于一部分样本来说,物种水平的鉴定都难以实现,这可能是由于我们的数据集中存在新的物种多样性,以及这些基因座上不完全的谱系分类和缺乏积累的同源特征。