使用系统发育定位/UShER 对严重急性呼吸综合征冠状病毒 2(SARS-CoV-2)进行谱系分类优于 pangoLEARN 机器学习方法。
SARS-CoV-2 lineage assignments using phylogenetic placement/UShER are superior to pangoLEARN machine-learning method.
作者信息
de Bernardi Schneider Adriano, Su Michelle, Hinrichs Angie S, Wang Jade, Amin Helly, Bell John, Wadford Debra A, O'Toole Áine, Scher Emily, Perry Marc D, Turakhia Yatish, De Maio Nicola, Hughes Scott, Corbett-Detig Russ
机构信息
Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
出版信息
Virus Evol. 2024 Jan 11;10(1):vead085. doi: 10.1093/ve/vead085. eCollection 2024.
With the rapid spread and evolution of SARS-CoV-2, the ability to monitor its transmission and distinguish among viral lineages is critical for pandemic response efforts. The most commonly used software for the lineage assignment of newly isolated SARS-CoV-2 genomes is pangolin, which offers two methods of assignment, pangoLEARN and pUShER. PangoLEARN rapidly assigns lineages using a machine-learning algorithm, while pUShER performs a phylogenetic placement to identify the lineage corresponding to a newly sequenced genome. In a preliminary study, we observed that pangoLEARN (decision tree model), while substantially faster than pUShER, offered less consistency across different versions of pangolin v3. Here, we expand upon this analysis to include v3 and v4 of pangolin, which moved the default algorithm for lineage assignment from pangoLEARN in v3 to pUShER in v4, and perform a thorough analysis confirming that pUShER is not only more stable across versions but also more accurate. Our findings suggest that future lineage assignment algorithms for various pathogens should consider the value of phylogenetic placement.
随着严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的迅速传播和演变,监测其传播并区分病毒谱系的能力对于应对大流行至关重要。用于新分离的SARS-CoV-2基因组谱系分型的最常用软件是穿山甲(pangolin),它提供了两种分型方法,即pangoLEARN和pUShER。PangoLEARN使用机器学习算法快速进行谱系分型,而pUShER则通过系统发育定位来确定与新测序基因组相对应的谱系。在一项初步研究中,我们观察到,pangoLEARN(决策树模型)虽然比pUShER快得多,但在穿山甲v3的不同版本中一致性较差。在此,我们扩展了这一分析,纳入了穿山甲v3和v4版本,其中v4将谱系分型的默认算法从v3中的pangoLEARN改为pUShER,并进行了全面分析,证实pUShER不仅在不同版本间更稳定,而且更准确。我们的研究结果表明,未来针对各种病原体的谱系分型算法应考虑系统发育定位的价值。