Das Samarendra, Nayak Utkal, Pal Soumen, Subramaniam Saravanan
Biostatistics and Bioinformatics Facility, ICAR-National Institute on Foot and Mouth Disease, Arugul, Bhubaneswar 752050, India.
Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, Pusa, New Delhi 110012, India.
Brief Funct Genomics. 2025 Jan 15;24. doi: 10.1093/bfgp/elaf001.
Molecular epidemiology of Foot-and-mouth disease (FMD) is crucial to implement its control strategies including vaccination and containment, which primarily deals with knowing serotype, topotype, and lineage of the virus. The existing approaches including serotyping are biological in nature, which are time-consuming and risky due to live virus handling. Thus, novel computational tools are highly required for large-scale molecular epidemiology of the FMD virus. This study reported a comprehensive computational tool for FMD molecular epidemiology. Ten learning algorithms were initially evaluated on cross-validated and ten independent secondary datasets for serotype prediction using sequence-based features through accuracy, sensitivity and 14 other metrics. Next, best performing algorithms, with higher serotype predictive accuracies, were evaluated for topotype and lineage prediction using cross-validation. These algorithms are implemented in the computational tool. Then, performance of the developed approach was assessed on five independent secondary datasets, never seen before, and primary experimental data. Our cross-validated and independent evaluation of learning algorithms for serotype prediction revealed that support vector machine, random forest, XGBoost, and AdaBoost algorithms outperformed others. Then, these four algorithms were evaluated for topotype and lineage prediction, which achieved accuracy ≥96% and precision ≥95% on cross-validated data. These algorithms are implemented in the web-server (https://nifmd-bbf.icar.gov.in/MolEpidPred), which allows rapid molecular epidemiology of FMD virus. The independent validation of the MolEpidPred observed accuracies ≥98%, ≥90%, and ≥ 80% for serotype, topotype, and lineage prediction, respectively. On wet-lab data, the MolEpidPred tool provided results in fewer seconds and achieved accuracies of 100%, 100%, and 96% for serotype, topotype, and lineage prediction, respectively, when benchmarked with phylogenetic analysis. MolEpidPred tool provides an innovative platform for large-scale molecular epidemiology of FMD virus, which is crucial for tracking FMD virus infection and implementing control program.
口蹄疫(FMD)的分子流行病学对于实施其控制策略(包括疫苗接种和疫情控制)至关重要,这些策略主要涉及了解病毒的血清型、拓扑型和谱系。现有的方法(包括血清分型)本质上是生物学方法,由于需要处理活病毒,既耗时又有风险。因此,口蹄疫病毒的大规模分子流行病学迫切需要新的计算工具。本研究报告了一种用于口蹄疫分子流行病学的综合计算工具。最初使用基于序列的特征,通过准确率、灵敏度和其他14个指标,在交叉验证数据集和10个独立的二级数据集上对10种学习算法进行血清型预测评估。接下来,使用交叉验证对具有较高血清型预测准确率的最佳算法进行拓扑型和谱系预测评估。这些算法在该计算工具中实现。然后,在五个之前从未见过的独立二级数据集和原始实验数据上评估所开发方法的性能。我们对血清型预测学习算法的交叉验证和独立评估表明,支持向量机、随机森林、XGBoost和AdaBoost算法的性能优于其他算法。然后,对这四种算法进行拓扑型和谱系预测评估,在交叉验证数据上准确率≥96%,精确率≥95%。这些算法在网络服务器(https://nifmd-bbf.icar.gov.in/MolEpidPred)中实现,该服务器可实现口蹄疫病毒的快速分子流行病学分析。对MolEpidPred的独立验证表明,血清型、拓扑型和谱系预测的准确率分别≥98%、≥90%和≥80%。在湿实验室数据上,与系统发育分析相比,MolEpidPred工具在数秒内即可得出结果,血清型、拓扑型和谱系预测的准确率分别达到100%、100%和96%。MolEpidPred工具为口蹄疫病毒的大规模分子流行病学提供了一个创新平台,这对于追踪口蹄疫病毒感染和实施控制计划至关重要。