Broni Emmanuel, Miller Whelton A
Department of Medicine, Loyola University Medical Center, Loyola University Chicago, Maywood, IL 60153, USA.
Department of Molecular Pharmacology & Neuroscience, Loyola University Medical Center, Loyola University Chicago, Maywood, IL 60153, USA.
Biomedicines. 2023 Feb 10;11(2):512. doi: 10.3390/biomedicines11020512.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a serious global challenge requiring urgent and permanent therapeutic solutions. These solutions can only be engineered if the patterns and rate of mutations of the virus can be elucidated. Predicting mutations and the structure of proteins based on these mutations have become necessary for early drug and vaccine design purposes in anticipation of future viral mutations. The amino acid composition (AAC) of proteomes and individual viral proteins provide avenues for exploitation since AACs have been previously used to predict structure, shape and evolutionary rates. Herein, the frequency of amino acid residues found in 1637 complete proteomes belonging to 11 SARS-CoV-2 variants/lineages were analyzed. Leucine is the most abundant amino acid residue in the SARS-CoV-2 with an average AAC of 9.658% while tryptophan had the least abundance of 1.11%. The AAC and ranking of lysine and glycine varied in the proteome. For some variants, glycine had higher frequency and AAC than lysine and vice versa in other variants. Tryptophan was also observed to be the most intolerant to mutation in the various proteomes for the variants used. A correlogram revealed a very strong correlation of 0.999992 between B.1.525 (Eta) and B.1.526 (Iota) variants. Furthermore, isoleucine and threonine were observed to have a very strong negative correlation of -0.912, while cysteine and isoleucine had a very strong positive correlation of 0.835 at < 0.001. Shapiro-Wilk normality test revealed that AAC values for all the amino acid residues except methionine showed no evidence of non-normality at < 0.05. Thus, AACs of SARS-CoV-2 variants can be predicted using probability and z-scores. AACs may be beneficial in classifying viral strains, predicting viral disease types, members of protein families, protein interactions and for diagnostic purposes. They may also be used as a feature along with other crucial factors in machine-learning based algorithms to predict viral mutations. These mutation-predicting algorithms may help in developing effective therapeutics and vaccines for SARS-CoV-2.
严重急性呼吸综合征冠状病毒2(SARS-CoV-2)是一项严峻的全球挑战,需要紧急且持久的治疗解决方案。只有阐明病毒的突变模式和速率,才能设计出这些解决方案。为了提前应对未来病毒突变,基于这些突变预测突变情况和蛋白质结构对于早期药物和疫苗设计而言已变得至关重要。蛋白质组和单个病毒蛋白的氨基酸组成(AAC)提供了可供利用的途径,因为AAC此前已被用于预测结构、形状和进化速率。在此,我们分析了属于11种SARS-CoV-2变体/谱系的1637个完整蛋白质组中氨基酸残基的频率。亮氨酸是SARS-CoV-2中最丰富的氨基酸残基,平均AAC为9.658%,而色氨酸的丰度最低,为1.11%。赖氨酸和甘氨酸的AAC及排名在蛋白质组中有所不同。对于某些变体,甘氨酸的频率和AAC高于赖氨酸,而在其他变体中则相反。在所使用的变体的各种蛋白质组中,还观察到色氨酸对突变的耐受性最低。相关图显示B.1.525(埃塔)和B.1.526(约塔)变体之间的相关性非常强,为0.999992。此外,异亮氨酸和苏氨酸的负相关性非常强,为-0.912,而半胱氨酸和异亮氨酸在<0.001时的正相关性非常强,为0.835。夏皮罗-威尔克正态性检验表明,除蛋氨酸外,所有氨基酸残基的AAC值在<0.05时均无异常迹象。因此,可以使用概率和z分数来预测SARS-CoV-2变体的AAC。AAC在对病毒株进行分类、预测病毒性疾病类型、蛋白质家族成员、蛋白质相互作用以及用于诊断目的方面可能会有所帮助。它们还可以与基于机器学习的算法中的其他关键因素一起用作特征,以预测病毒突变。这些突变预测算法可能有助于开发针对SARS-CoV-2的有效治疗方法和疫苗。