Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, Wisconsin 54449, United States.
Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan.
J Proteome Res. 2021 May 7;20(5):2942-2952. doi: 10.1021/acs.jproteome.1c00156. Epub 2021 Apr 15.
There is an urgent need to elucidate the underlying mechanisms of coronavirus disease (COVID-19) so that vaccines and treatments can be devised. Severe acute respiratory syndrome coronavirus 2 has genetic similarity with bats and pangolin viruses, but a comprehensive understanding of the functions of its proteins at the amino acid sequence level is lacking. A total of 4320 sequences of human and nonhuman coronaviruses was retrieved from the Global Initiative on Sharing All Influenza Data and the National Center for Biotechnology Information. This work proposes an optimization method COVID-Pred with an efficient feature selection algorithm to classify the species-specific coronaviruses based on physicochemical properties (PCPs) of their sequences. COVID-Pred identified a set of 11 PCPs using a support vector machine and achieved 10-fold cross-validation and test accuracies of 99.53% and 97.80%, respectively. These findings could provide key insights into understanding the driving forces during the course of infection and assist in developing effective therapies.
目前迫切需要阐明冠状病毒病(COVID-19)的潜在机制,以便设计疫苗和治疗方法。严重急性呼吸综合征冠状病毒 2 与蝙蝠和穿山甲病毒具有遗传相似性,但对其蛋白质在氨基酸序列水平上的功能缺乏全面了解。从全球流感数据共享倡议和国家生物技术信息中心检索到了 4320 个人类和非人类冠状病毒序列。这项工作提出了一种优化方法 COVID-Pred,该方法使用有效的特征选择算法,根据序列的理化特性(PCPs)对种特异性冠状病毒进行分类。COVID-Pred 使用支持向量机识别了一组 11 个 PCPs,分别实现了 10 倍交叉验证和测试的准确率为 99.53%和 97.80%。这些发现可以为理解感染过程中的驱动力提供重要的见解,并有助于开发有效的治疗方法。