School of Software, Yunnan University, Kunming, China.
School of Information Science and Engineering, Yunnan University, Kunming, China.
J Biomed Inform. 2021 May;117:103736. doi: 10.1016/j.jbi.2021.103736. Epub 2021 Mar 9.
The recent outbreak of COVID-19 has infected millions of people around the world, which is leading to the global emergency. In the event of the virus outbreak, it is crucial to get the carriers of the virus timely and precisely, then the animal origins can be isolated for further infection. Traditional identifications rely on fields and laboratory researches that lag the responses to emerging epidemic prevention. With the development of machine learning, the efficiency of predicting the viral hosts has been demonstrated by recent researchers. However, the problems of the limited annotated virus data and imbalanced hosts information restrict these approaches to obtain a better result. To assure the high reliability of predicting the animal origins on COVID-19, we extend transfer learning and ensemble learning to present a hybrid transfer learning model. When predicting the hosts of newly discovered virus, our model provides a novel solution to utilize the related virus domain as auxiliary to help building a robust model for target virus domain. The simulation results on several UCI benchmarks and viral genome datasets demonstrate that our model outperforms the general classical methods under the condition of limited target training sets and class-imbalance problems. By setting the coronavirus as target domain and other related virus as source domain, the feasibility of our approach is evaluated. Finally, we show the animal reservoirs prediction of the COVID-19 for further analysing.
最近爆发的 COVID-19 疫情已感染了全球数百万人,这导致了全球紧急情况。在病毒爆发的情况下,及时准确地找到病毒携带者至关重要,然后可以对动物宿主进行隔离以防止进一步感染。传统的鉴定方法依赖于领域和实验室研究,这些方法滞后于新兴传染病预防的反应。随着机器学习的发展,最近的研究人员已经证明了预测病毒宿主的效率。然而,病毒数据标注有限和宿主信息不平衡的问题限制了这些方法获得更好的结果。为了确保 COVID-19 预测动物宿主的高可靠性,我们扩展了迁移学习和集成学习,提出了一种混合迁移学习模型。在预测新发现病毒的宿主时,我们的模型提供了一种新颖的解决方案,利用相关病毒领域作为辅助,帮助为目标病毒领域建立一个稳健的模型。在有限的目标训练集和类不平衡问题的条件下,我们在几个 UCI 基准和病毒基因组数据集上的模拟结果表明,我们的模型优于一般的经典方法。通过将冠状病毒作为目标领域,将其他相关病毒作为源领域,评估了我们方法的可行性。最后,我们展示了 COVID-19 的动物宿主预测,以便进一步分析。