Kozlova Edgar, Viart Benjamin, de Avila Ricardo, Felicori Liza, Chavez-Olortegui Carlos
BMC Bioinformatics. 2015;16 Suppl 19(Suppl 19):S7. doi: 10.1186/1471-2105-16-S19-S7. Epub 2015 Dec 16.
The humoral immune system response is based on the interaction between antibodies and antigens for the clearance of pathogens and foreign molecules. The interaction between these proteins occurs at specific positions known as antigenic determinants or B-cell epitopes. The experimental identification of epitopes is costly and time consuming. Therefore the use of in silico methods, to help discover new epitopes, is an appealing alternative due the importance of biomedical applications such as vaccine design, disease diagnostic, anti-venoms and immune-therapeutics. However, the performance of predictions is not optimal been around 70% of accuracy. Further research could increase our understanding of the biochemical and structural properties that characterize a B-cell epitope.
We investigated the possibility of linear epitopes from the same protein family to share common properties. This hypothesis led us to analyze physico-chemical (PCP) and predicted secondary structure (PSS) features of a curated dataset of epitope sequences available in the literature belonging to two different groups of antigens (metalloproteinases and neurotoxins). We discovered statistically significant parameters with data mining techniques which allow us to distinguish neurotoxin from metalloproteinase and these two from random sequences. After a five cross fold validation we found that PCP based models obtained area under the curve values (AUC) and accuracy above 0.9 for regression, decision tree and support vector machine.
We demonstrated that antigen's family can be inferred from properties within a single group of linear epitopes (metalloproteinases or neurotoxins). Also we discovered the characteristics that represent these two epitope groups including their similarities and differences with random peptides and their respective amino acid sequence. These findings open new perspectives to improve epitope prediction by considering the specific antigen's protein family. We expect that these findings will help to improve current computational mapping methods based on physico-chemical due it's potential application during epitope discovery.
体液免疫系统反应基于抗体与抗原之间的相互作用,以清除病原体和外来分子。这些蛋白质之间的相互作用发生在特定位置,即抗原决定簇或B细胞表位。表位的实验鉴定成本高昂且耗时。因此,由于疫苗设计、疾病诊断、抗蛇毒血清和免疫治疗等生物医学应用的重要性,使用计算机模拟方法来帮助发现新表位是一种有吸引力的替代方法。然而,预测的性能并不理想,准确率约为70%。进一步的研究可以增进我们对表征B细胞表位的生化和结构特性的理解。
我们研究了来自同一蛋白质家族的线性表位共享共同特性的可能性。这一假设促使我们分析文献中属于两组不同抗原(金属蛋白酶和神经毒素)的表位序列的精选数据集的物理化学(PCP)和预测二级结构(PSS)特征。我们通过数据挖掘技术发现了具有统计学意义的参数,这些参数使我们能够区分神经毒素和金属蛋白酶,以及这两者与随机序列的差异。经过五次交叉折叠验证,我们发现基于PCP的模型在回归、决策树和支持向量机方面的曲线下面积值(AUC)和准确率均高于0.9。
我们证明了可以从单一线性表位组(金属蛋白酶或神经毒素)内的特性推断抗原家族。我们还发现了代表这两个表位组的特征,包括它们与随机肽的异同以及各自的氨基酸序列。这些发现为通过考虑特定抗原的蛋白质家族来改进表位预测开辟了新的视角。我们预计这些发现将有助于改进基于物理化学的当前计算映射方法,因为它在表位发现过程中具有潜在应用。