Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, USA; School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore.
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China.
J Biomed Inform. 2023 Jun;142:104388. doi: 10.1016/j.jbi.2023.104388. Epub 2023 May 11.
Influenza viruses pose great threats to public health and cause enormous economic losses every year. Previous work has revealed the viral factors associated with the virulence of influenza viruses in mammals. However, taking prior viral knowledge represented by heterogeneous categorical and discrete information into account to explore virus virulence is scarce in the existing work. How to make full use of the preceding domain knowledge in virulence study is challenging but beneficial. This paper proposes a general framework named ViPal for virulence prediction in mice that incorporates discrete prior viral mutation and reassortment information based on all eight influenza segments. The posterior regularization technique is leveraged to transform prior viral knowledge into constraint features and integrated into the machine learning models. Experimental results on influenza genomic datasets validate that our proposed framework can improve virulence prediction performance over baselines. The comparison between ViPal and other existing methods shows the computational efficiency of our framework with comparable or superior performance. Moreover, the interpretable analysis through SHAP (SHapley Additive exPlanations) identifies the scores of constraint features contributing to the prediction. We hope this framework could provide assistance for the accurate detection of influenza virulence and facilitate flu surveillance.
流感病毒每年都会对公众健康造成巨大威胁,并造成巨大的经济损失。以前的工作已经揭示了与哺乳动物中流感病毒毒力相关的病毒因素。然而,在现有工作中,很少考虑到以异质分类和离散信息为代表的先前病毒知识,来探索病毒毒力。如何充分利用毒力研究中的先前领域知识具有挑战性,但也有益处。本文提出了一个名为 ViPal 的通用框架,用于在小鼠中进行毒力预测,该框架基于所有八个流感片段整合了离散的先前病毒突变和重配信息。利用后验正则化技术将先前的病毒知识转换为约束特征,并集成到机器学习模型中。在流感基因组数据集上的实验结果验证了我们提出的框架可以提高毒力预测性能,优于基线。Vipal 与其他现有方法的比较显示了我们框架的计算效率,具有可比或更高的性能。此外,通过 SHAP(Shapley Additive exPlanations)进行的可解释性分析确定了对预测有贡献的约束特征的得分。我们希望这个框架可以为流感毒力的准确检测提供帮助,并促进流感监测。