Institute of Intelligence Informatics Technology, Sangmyung University, Seoul 03016, Korea.
Department of Pediatrics, Washington University in St. Louis, St. Louis, MO 63110, USA.
Int J Mol Sci. 2022 Aug 23;23(17):9518. doi: 10.3390/ijms23179518.
, or group A (GAS), a gram-positive bacterium, is implicated in a wide range of clinical manifestations and life-threatening diseases. One of the key virulence factors of GAS is streptopain, a C10 family cysteine peptidase. Since its discovery, various homologs of streptopain have been reported from other bacterial species. With the increased affordability of sequencing, a significant increase in the number of potential C10 family-like sequences in the public databases is anticipated, posing a challenge in classifying such sequences. Sequence-similarity-based tools are the methods of choice to identify such streptopain-like sequences. However, these methods depend on some level of sequence similarity between the existing C10 family and the target sequences. Therefore, in this work, we propose a novel predictor, C10Pred, for the prediction of C10 peptidases using sequence-derived optimal features. C10Pred is a support vector machine (SVM) based model which is efficient in predicting C10 enzymes with an overall accuracy of 92.7% and Matthews' correlation coefficient (MCC) value of 0.855 when tested on an independent dataset. We anticipate that C10Pred will serve as a handy tool to classify novel streptopain-like proteins belonging to the C10 family and offer essential information.
,或 A 组链球菌(GAS),是一种革兰氏阳性细菌,与广泛的临床表现和危及生命的疾病有关。GAS 的关键毒力因子之一是链霉蛋白酶,一种 C10 家族半胱氨酸蛋白酶。自发现以来,其他细菌物种中也报道了各种链霉蛋白酶的同源物。随着测序成本的降低,预计公共数据库中潜在的 C10 家族样序列的数量将大幅增加,这对分类此类序列构成了挑战。基于序列相似性的工具是识别此类链霉蛋白酶样序列的首选方法。然而,这些方法依赖于现有 C10 家族和目标序列之间一定程度的序列相似性。因此,在这项工作中,我们提出了一种新的预测器 C10Pred,用于使用序列衍生的最佳特征预测 C10 肽酶。C10Pred 是一种基于支持向量机(SVM)的模型,在独立数据集上测试时,其整体准确性为 92.7%,马修斯相关系数(MCC)值为 0.855,可有效预测 C10 酶。我们预计 C10Pred 将成为一种方便的工具,用于分类属于 C10 家族的新型链霉蛋白酶样蛋白,并提供必要的信息。