Su Min-Gang, Huang Chien-Hsun, Lee Tzong-Yi, Chen Yu-Ju, Wu Hsin-Yi
Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan.
Tao-Yuan Hospital, Ministry of Health & Welfare, Taoyuan 320, Taiwan.
Biomed Res Int. 2014;2014:972692. doi: 10.1155/2014/972692. Epub 2014 Jul 7.
Aside from pathogenesis, bacterial toxins also have been used for medical purpose such as drugs for cancer and immune diseases. Correctly identifying bacterial toxins and their types (endotoxins and exotoxins) has great impact on the cell biology study and therapy development. However, experimental methods for bacterial toxins identification are time-consuming and labor-intensive, implying an urgent need for computational prediction. Thus, we are motivated to develop a method for computational identification of bacterial toxins based on amino acid sequences and functional domain information. In this study, a nonredundant dataset of 167 bacterial toxins including 77 exotoxins and 90 endotoxins is adopted to learn the predictive model by using support vector machines (SVMs). The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively. For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively. After incorporating functional domain information, the predictive performance is further improved. The proposed method has been demonstrated to be able to more effectively identify and classify bacterial toxins than the other two features on independent dataset, which may aid in bacterial biomedical development.
除了致病机制外,细菌毒素还被用于医学目的,如治疗癌症和免疫疾病的药物。正确识别细菌毒素及其类型(内毒素和外毒素)对细胞生物学研究和治疗方法的开发具有重大影响。然而,用于识别细菌毒素的实验方法既耗时又费力,这意味着迫切需要进行计算预测。因此,我们有动力开发一种基于氨基酸序列和功能域信息的细菌毒素计算识别方法。在本研究中,采用了一个包含167种细菌毒素的非冗余数据集,其中包括77种外毒素和90种内毒素,通过支持向量机(SVM)学习预测模型。交叉验证评估表明,用氨基酸和二肽组成训练的SVM模型的准确率分别可达96.07%和92.50%。对于区分内毒素和外毒素,用氨基酸和二肽组成训练的SVM模型的准确率分别达到了95.71%和92.86%。纳入功能域信息后,预测性能进一步提高。与其他两种特征相比,所提出的方法在独立数据集上已被证明能够更有效地识别和分类细菌毒素,这可能有助于细菌生物医学的发展。