Department of Medicine, Division of Infectious Diseases, University of California, Irvine, CA 92697, USA.
Chem Biodivers. 2012 May;9(5):977-90. doi: 10.1002/cbdv.201100360.
Discovery of novel antigens associated with infectious diseases is fundamental to the development of serodiagnostic tests and protein subunit vaccines against existing and emerging pathogens. Efforts to predict antigenicity have relied on a few computational algorithms predicting signal peptide sequences (SignalP), transmembrane domains, or subcellular localization (pSort). An empirical protein microarray approach was developed to scan the entire proteome of any infectious microorganism and empirically determine immunoglobulin reactivity against all the antigens from a microorganism in infected individuals. The current database from this activity contains quantitative antibody reactivity data against 35,000 proteins derived from 25 infectious microorganisms and more than 30 million data points derived from 15,000 patient sera. Interrogation of these data sets has revealed ten proteomic features that are associated with antigenicity, allowing an in silico protein sequence and functional annotation based approach to triage the least likely antigenic proteins from those that are more likely to be antigenic. The first iteration of this approach applied to Brucella melitensis predicted 37% of the bacterial proteome containing 91% of the antigens empirically identified by probing proteome microarrays. In this study, we describe a naïve Bayes classification approach that can be used to assign a relative score to the likelihood that an antigen will be immunoreactive and serodiagnostic in a bacterial proteome. This algorithm predicted 20% of the B. melitensis proteome including 91% of the serodiagnostic antigens, a nearly twofold improvement in specificity of the predictor. These results give us confidence that further development of this approach will lead to further improvements in the sensitivity and specificity of this in silico predictive algorithm.
发现与传染病相关的新抗原对于开发针对现有和新兴病原体的血清诊断测试和蛋白质亚单位疫苗至关重要。预测抗原性的努力依赖于少数预测信号肽序列(SignalP)、跨膜结构域或亚细胞定位(pSort)的计算算法。开发了一种经验性蛋白质微阵列方法来扫描任何传染性微生物的整个蛋白质组,并从感染个体中微生物的所有抗原中经验性地确定免疫球蛋白反应性。该活动的当前数据库包含来自 25 种传染性微生物的 35000 种蛋白质的定量抗体反应数据,以及来自 15000 份患者血清的 3000 多万个数据点。对这些数据集的查询揭示了与抗原性相关的十个蛋白质组特征,允许基于计算的蛋白质序列和功能注释方法来对最不可能的抗原性蛋白质进行分类,而不是那些更可能具有抗原性的蛋白质。该方法的第一个迭代应用于布鲁氏菌属,预测了包含通过探测蛋白质组微阵列经验鉴定的抗原的 91%的细菌蛋白质组的 37%。在这项研究中,我们描述了一种朴素贝叶斯分类方法,可用于对细菌蛋白质组中抗原免疫反应性和血清诊断的可能性进行相对评分。该算法预测了布鲁氏菌属蛋白质组的 20%,包括 91%的血清诊断抗原,预测者的特异性提高了近两倍。这些结果使我们有信心,进一步开发这种方法将进一步提高这种计算预测算法的敏感性和特异性。