Cho Myeongji, Kim Hayeon, Son Hyeon S
Laboratory of Computational Virology & Viroinformatics, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Korea.
Institute of Health and Environment, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Korea.
Genes Genomics. 2021 Apr;43(4):407-420. doi: 10.1007/s13258-021-01059-2. Epub 2021 Mar 1.
The large tumor antigen (LT-Ag) and major capsid protein VP1 are known to play important roles in determining the host-specific infection properties of polyomaviruses (PyVs).
The objective of this study was to investigate the physicochemical properties of amino acids of LT-Ag and VP1 that have important effects on host specificity, as well as classification techniques used to predict PyV hosts.
We collected and used reference sequences of 86 viral species for analysis. Based on the clustering pattern of the reconstructed phylogenetic tree, the dataset was divided into three groups: mammalian, avian, and fish. We then used random forest (RF), naïve Bayes (NB), and k-nearest neighbors (kNN) algorithms for host classification.
Among the three algorithms, classification accuracy using kNN was highest in both LT-Ag (ACC = 98.83) and VP1 (ACC = 96.51). The amino acid physicochemical property most strongly correlated with host classification was charge, followed by solvent accessibility, polarity, and hydrophobicity in LT-Ag. However, in VP1, amino acid composition showed the highest correlation with host classification, followed by charge, normalized van der Waals volume, and solvent accessibility.
The results of the present study suggest the possibility of determining or predicting the host range and infection properties of PyVs at the molecular level by identifying the host species of active and emerging PyVs that exhibit different infection properties among diverse host species. Structural and biochemical differences of LT-Ag and VP1 proteins in host species that reflect these amino acid properties can be considered primary factors that determine the host specificity of PyV.
已知大肿瘤抗原(LT-Ag)和主要衣壳蛋白VP1在决定多瘤病毒(PyV)的宿主特异性感染特性方面发挥重要作用。
本研究的目的是研究对宿主特异性有重要影响的LT-Ag和VP1氨基酸的物理化学性质,以及用于预测PyV宿主的分类技术。
我们收集并使用了86种病毒物种的参考序列进行分析。根据重建的系统发育树的聚类模式,将数据集分为三组:哺乳动物、鸟类和鱼类。然后我们使用随机森林(RF)、朴素贝叶斯(NB)和k近邻(kNN)算法进行宿主分类。
在这三种算法中,使用kNN的分类准确率在LT-Ag(ACC = 98.83)和VP1(ACC = 96.51)中都是最高的。与宿主分类相关性最强的氨基酸物理化学性质是电荷,其次是LT-Ag中的溶剂可及性、极性和疏水性。然而,在VP1中,氨基酸组成与宿主分类的相关性最高,其次是电荷、归一化范德华体积和溶剂可及性。
本研究结果表明,通过识别在不同宿主物种中表现出不同感染特性的活跃和新出现的PyV的宿主物种,有可能在分子水平上确定或预测PyV的宿主范围和感染特性。反映这些氨基酸性质的宿主物种中LT-Ag和VP1蛋白的结构和生化差异可被视为决定PyV宿主特异性的主要因素。