Ma Yue, Hu Yu, Xia Binbin, Du Pei, Wu Lili, Liang Mifang, Chen Qian, Yan Huan, Gao George F, Wang Qihui, Wang Jun
CAS Key Laboratory of Pathogen Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China.
School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China.
China CDC Wkly. 2021 Nov 12;3(46):967-972. doi: 10.46234/ccdcw2021.235.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a recently emergent coronavirus of natural origin and caused the coronavirus disease (COVID-19) pandemic. The study of its natural origin and host range is of particular importance for source tracing, monitoring of this virus, and prevention of recurrent infections. One major approach is to test the binding ability of the viral receptor gene ACE2 from various hosts to SARS-CoV-2 spike protein, but it is time-consuming and labor-intensive to cover a large collection of species.
In this paper, we applied state-of-the-art machine learning approaches and created a pipeline reaching >87% accuracy in predicting binding between different ACE2 and SARS-CoV-2 spike.
We further validated our prediction pipeline using 2 independent test sets involving >50 bat species and achieved >78% accuracy. A large-scale screening of 204 mammal species revealed 144 species (or 61%) were susceptible to SARS-CoV-2 infections, highlighting the importance of intensive monitoring and studies in mammalian species.
In short, our study employed machine learning models to create an important tool for predicting potential hosts of SARS-CoV-2 and achieved the highest precision to our knowledge in experimental validation. This study also predicted that a wide range of mammals were capable of being infected by SARS-CoV-2.
严重急性呼吸综合征冠状病毒2(SARS-CoV-2)是一种最近出现的自然起源冠状病毒,引发了冠状病毒病(COVID-19)大流行。对其天然起源和宿主范围的研究对于追踪病毒源头、监测该病毒以及预防反复感染尤为重要。一种主要方法是测试来自各种宿主的病毒受体基因ACE2与SARS-CoV-2刺突蛋白的结合能力,但涵盖大量物种既耗时又费力。
在本文中,我们应用了最先进的机器学习方法,并创建了一个在预测不同ACE2与SARS-CoV-2刺突之间的结合时准确率超过87%的流程。
我们使用涉及50多种蝙蝠物种的2个独立测试集进一步验证了我们的预测流程,准确率超过78%。对204种哺乳动物的大规模筛查显示,144种(或61%)易受SARS-CoV-2感染,突出了对哺乳动物物种进行密集监测和研究的重要性。
简而言之,我们的研究采用机器学习模型创建了一个预测SARS-CoV-2潜在宿主的重要工具,并在实验验证中达到了我们所知的最高精度。这项研究还预测,多种哺乳动物都可能被SARS-CoV-2感染。