Su Shuquan, Ni Zhongran, Lan Tian, Ping Pengyao, Tang Jinling, Yu Zuguo, Hutvagner Gyorgy, Li Jinyan
Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, China.
School of Computer Science (SoCS), Faculty of Engineering and Information Technology (FEIT), University of Technology Sydney (UTS), Sydney, Australia.
Sci Rep. 2025 Apr 10;15(1):12251. doi: 10.1038/s41598-025-91469-z.
Viral codon fitness (VCF) of the host and the VCF shifting has seldom been studied under quantitative measurements, although they could be concepts vital to understand pathogen epidemiology. This study demonstrates that the relative synonymous codon usage (RSCU) of virus genomes together with other genomic properties are predictive of virus host codon fitness through tree-based machine learning. Statistical analysis on the RSCU data matrix also revealed that the wobble position of the virus codons is critically important for the host codon fitness distinction. As the trained models can well characterise the host codon fitness of the viruses, the frequency and other details stored at the leaf nodes of these models can be reliably translated into human virus codon fitness score (HVCF score) as a readout of codon fitness of any virus infecting human. Specifically, we evaluated and compared HVCF of virus genome sequences from human sources and others and evaluated HVCF of SARS-CoV-2 genome sequences from NCBI virus database, where we found no obvious shifting trend in host codon fitness towards human-non-infectious. We also developed a bioinformatics tool to simulate codon-based virus fitness shifting using codon compositions of the viruses, and we found that Tylonycteris bat coronavirus HKU4 related viruses may have close relationship with SARS-CoV-2 in terms of human codon fitness. The finding of abundant synonymous mutations in the predicted codon fitness shifting path also provides new insights for evolution research and virus monitoring in environmental surveillance.
尽管宿主的病毒密码子适应性(VCF)和VCF变化对于理解病原体流行病学可能是至关重要的概念,但在定量测量下很少被研究。本研究表明,通过基于树的机器学习,病毒基因组的相对同义密码子使用情况(RSCU)以及其他基因组特性可预测病毒宿主密码子适应性。对RSCU数据矩阵的统计分析还表明,病毒密码子的摆动位置对于区分宿主密码子适应性至关重要。由于训练后的模型能够很好地表征病毒的宿主密码子适应性,因此这些模型叶节点存储的频率和其他细节可以可靠地转化为人类病毒密码子适应性评分(HVCF评分),作为任何感染人类病毒密码子适应性的读数。具体而言,我们评估并比较了来自人类来源和其他来源的病毒基因组序列的HVCF,并评估了NCBI病毒数据库中SARS-CoV-2基因组序列的HVCF,我们发现宿主密码子适应性没有明显向人类非感染性的变化趋势。我们还开发了一种生物信息学工具,使用病毒的密码子组成来模拟基于密码子的病毒适应性变化,并且我们发现伏翼蝙蝠冠状病毒HKU4相关病毒在人类密码子适应性方面可能与SARS-CoV-2有密切关系。在预测的密码子适应性变化路径中发现大量同义突变,也为环境监测中的进化研究和病毒监测提供了新的见解。