Zhang Sen, Li Ya-Dan, Cai Yu-Rong, Kang Xiao-Ping, Feng Ye, Li Yu-Chang, Chen Yue-Hong, Li Jing, Bao Li-Li, Jiang Tao
State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China.
College of Basic Medical Sciences, Anhui Medical University, Hefei, China.
Front Genet. 2024 Mar 1;15:1361952. doi: 10.3389/fgene.2024.1361952. eCollection 2024.
The global headlines have been dominated by the sudden and widespread outbreak of monkeypox, a rare and endemic zoonotic disease caused by the monkeypox virus (MPXV). Genomic composition based machine learning (ML) methods have recently shown promise in identifying host adaptability and evolutionary patterns of virus. Our study aimed to analyze the genomic characteristics and evolutionary patterns of MPXV using ML methods. The open reading frame (ORF) regions of full-length MPXV genomes were filtered and 165 ORFs were selected as clusters with the highest homology. Unsupervised machine learning methods of t-distributed stochastic neighbor embedding (t-SNE), Principal Component Analysis (PCA), and hierarchical clustering were performed to observe the DCR characteristics of the selected ORF clusters. The results showed that MPXV sequences post-2022 showed an obvious linear adaptive evolution, indicating that it has become more adapted to the human host after accumulating mutations. For further accurate analysis, the ORF regions with larger variations were filtered out based on the ranking of homology difference to narrow down the key ORF clusters, which drew the same conclusion of linear adaptability. Then key differential protein structures were predicted by AlphaFold 2, which meant that difference in main domains might be one of the internal reasons for linear adaptive evolution. Understanding the process of linear adaptation is critical in the constant evolutionary struggle between viruses and their hosts, playing a significant role in crafting effective measures to tackle viral diseases. Therefore, the present study provides valuable insights into the evolutionary patterns of the MPXV in 2022 from the perspective of genomic composition characteristics analysis through ML methods.
全球各大媒体头条都被猴痘的突然广泛爆发所占据,猴痘是一种由猴痘病毒(MPXV)引起的罕见的地方性人畜共患病。基于基因组组成的机器学习(ML)方法最近在识别病毒的宿主适应性和进化模式方面显示出前景。我们的研究旨在使用ML方法分析MPXV的基因组特征和进化模式。对全长MPXV基因组的开放阅读框(ORF)区域进行筛选,选择165个ORF作为同源性最高的簇。采用t分布随机邻域嵌入(t-SNE)、主成分分析(PCA)和层次聚类等无监督机器学习方法来观察所选ORF簇的DCR特征。结果表明,2022年后的MPXV序列呈现出明显的线性适应性进化,表明其在积累突变后对人类宿主的适应性更强。为了进行进一步的精确分析,根据同源性差异排名筛选出变异较大的ORF区域,以缩小关键ORF簇范围,得出了相同的线性适应性结论。然后通过AlphaFold 2预测关键差异蛋白结构,这意味着主要结构域的差异可能是线性适应性进化的内在原因之一。了解线性适应过程在病毒与其宿主之间持续的进化斗争中至关重要,对制定有效的病毒性疾病应对措施具有重要意义。因此,本研究通过ML方法从基因组组成特征分析的角度为2022年MPXV的进化模式提供了有价值的见解。