Tong Joo Chuan, Tammi Martti T
Data Mining Department, Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613.
Front Biosci. 2008 May 1;13:6072-8. doi: 10.2741/3138.
The constant increase in atopic allergy and other hypersensitivity reactions has intensified the need for successful therapeutic approaches. Existing bioinformatic tools for predicting allergenic potential are primarily based on sequence similarity searches along the entire protein sequence and do not address the dual issues of conformational and overlapping B-cell epitope recognition sites. In this study, we report AllerPred, a computational system that is capable of capturing multiple overlapping continuous and discontinuous B-cell epitope binding patterns in allergenic proteins using SVM as its prediction engine. A novel representation of local protein sequence descriptors enables the system to model multiple overlapping continuous and discontinuous B-cell epitope binding patterns within a protein sequence. The model was rigorously trained and tested using 669 IUIS allergens and 1237 non-allergens. Testing results showed that the area under the receiver operating curve (AROC) of SVM models is 0.81 with 76 percent sensitivity at specificity of 76 percent . This approach consistently outperforms existing allergenicity prediction systems using a standardized testing dataset of experimentally validated allergens and non-allergen sequences.
特应性过敏和其他超敏反应的持续增加,强化了对成功治疗方法的需求。现有的用于预测变应原性潜力的生物信息学工具主要基于沿整个蛋白质序列的序列相似性搜索,并未解决构象性和重叠性B细胞表位识别位点的双重问题。在本研究中,我们报告了AllerPred,这是一种计算系统,它能够以支持向量机(SVM)作为预测引擎,捕捉变应原性蛋白质中多个重叠的连续和不连续B细胞表位结合模式。一种局部蛋白质序列描述符的新颖表示法,使该系统能够对蛋白质序列内多个重叠的连续和不连续B细胞表位结合模式进行建模。该模型使用669种国际免疫学会(IUIS)变应原和1237种非变应原进行了严格的训练和测试。测试结果表明,支持向量机模型的受试者工作特征曲线下面积(AROC)为0.81,在特异性为76%时灵敏度为76%。使用经实验验证的变应原和非变应原序列的标准化测试数据集,该方法始终优于现有的变应原性预测系统。