Daodu Richard Olumide, Awotoro Ebenezer, Ulrich Jens-Uwe, Kühnert Denise
Center for Artificial Intelligence in Public Health Research, Robert Koch Institute, Wildau, Germany.
Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany.
PLoS Negl Trop Dis. 2025 Sep 9;19(9):e0013512. doi: 10.1371/journal.pntd.0013512. eCollection 2025 Sep.
Lassa fever, caused by the Lassa virus (LASV), is a deadly disease characterized by hemorrhages. Annually, it affects approximately 300,000 people in West Africa and causes about 5,000 deaths. It currently has no approved vaccine and is categorized as a top-priority disease. Apart from its endemicity to West Africa, there have been exported cases in almost all continents, including several European countries. Distinct Lassa virus lineages circulate in specific regions, and have been reported to show varying immunological behaviors and may contribute to differing disease outcomes. It is therefore important to rapidly identify which lineage caused an outbreak or an exported case. We present CLASV, a machine learning-based lineage assignment tool built using a Random Forest classifier. CLASV processes raw nucleotide sequences and assigns them to the dominant circulating lineages (II, III, and IV/V) rapidly and accurately. CLASV is implemented in Python for ease of integration into existing workflows and is freely available for public use.
拉沙热由拉沙病毒(LASV)引起,是一种以出血为特征的致命疾病。每年,它在西非影响约30万人,并导致约5000人死亡。目前尚无获批疫苗,它被列为重点疾病。除了在西非流行外,几乎在所有大陆都有输入病例,包括几个欧洲国家。不同的拉沙病毒谱系在特定地区传播,据报道表现出不同的免疫行为,并可能导致不同的疾病结果。因此,迅速确定是哪个谱系导致了疫情爆发或输入病例很重要。我们展示了CLASV,这是一种基于机器学习的谱系分配工具,使用随机森林分类器构建。CLASV处理原始核苷酸序列,并将它们快速准确地分配到主要的流行谱系(II、III和IV/V)。CLASV用Python实现,便于集成到现有工作流程中,可免费供公众使用。