Alberts Famke, Berke Olaf, Rocha Leilani, Keay Sheila, Maboni Grazieli, Poljak Zvonimir
Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada.
Centre for Public Health and Zoonoses, University of Guelph, Guelph, ON, Canada.
Front Vet Sci. 2024 Sep 25;11:1358028. doi: 10.3389/fvets.2024.1358028. eCollection 2024.
Predicting which species are susceptible to viruses (i.e., host range) is important for understanding and developing effective strategies to control viral outbreaks in both humans and animals. The use of machine learning and bioinformatic approaches to predict viral hosts has been expanded with advancements in techniques. We conducted a scoping review to identify the breadth of machine learning methods applied to influenza and coronavirus genome data for the identification of susceptible host species.
The protocol for this scoping review is available at https://hdl.handle.net/10214/26112. Five online databases were searched, and 1,217 citations, published between January 2000 and May 2022, were obtained, and screened in duplicate for English language and research, covering the use of machine learning to identify susceptible species to viruses.
Fifty-three relevant publications were identified for data charting. The breadth of research was extensive including 32 different machine learning algorithms used in combination with 29 different feature selection methods and 43 different genome data input formats. There were 20 different methods used by authors to assess accuracy. Authors mostly used influenza viruses ( = 31/53 publications, 58.5%), however, more recent publications focused on coronaviruses and other viruses in combination with influenza viruses ( = 22/53, 41.5%). The susceptible animal groups authors most used were humans ( = 57/77 analyses, 74.0%), avian ( = 35/77 45.4%), and swine ( = 28/77, 36.4%). In total, 53 different hosts were used and, in most publications, data from multiple hosts was used.
The main gaps in research were a lack of standardized reporting of methodology and the use of broad host categories for classification. Overall, approaches to viral host identification using machine learning were diverse and extensive.
预测哪些物种易感染病毒(即宿主范围)对于理解和制定控制人类和动物病毒爆发的有效策略至关重要。随着技术的进步,利用机器学习和生物信息学方法预测病毒宿主的应用得到了扩展。我们进行了一项范围综述,以确定应用于流感和冠状病毒基因组数据以识别易感宿主物种的机器学习方法的广度。
确定了53篇相关出版物用于数据图表绘制。研究范围广泛,包括32种不同的机器学习算法与29种不同的特征选择方法以及43种不同的基因组数据输入格式相结合。作者使用了20种不同的方法来评估准确性。作者大多使用流感病毒(31/53篇出版物,58.5%),然而,最近的出版物则侧重于冠状病毒和其他病毒与流感病毒的联合研究(22/53篇,41.5%)。作者最常使用的易感动物群体是人类(57/77项分析,74.0%)、禽类(35/77项,45.4%)和猪(28/77项,36.4%)。总共使用了53种不同的宿主,并且在大多数出版物中使用了来自多个宿主的数据。
研究中的主要差距在于缺乏方法学的标准化报告以及使用宽泛的宿主类别进行分类。总体而言,使用机器学习进行病毒宿主识别的方法多样且广泛。