Bastas Gerassimos, Sompuram Seshi R, Pierce Brian, Vani Kodela, Bogen Steven A
Department of Pathology & Laboratory Medicine, Boston University School of Medicine, Boston, MA 02118, USA.
Mol Cell Proteomics. 2008 Feb;7(2):247-56. doi: 10.1074/mcp.M700107-MCP200. Epub 2007 Sep 25.
We describe a new approach to identify proteins involved in disease pathogenesis. The technology, Epitope-Mediated Antigen Prediction (E-MAP), leverages the specificity of patients' immune responses to disease-relevant targets and requires no prior knowledge about the protein. E-MAP links pathologic antibodies of unknown specificity, isolated from patient sera, to their cognate antigens in the protein database. The E-MAP process first involves reconstruction of a predicted epitope using a peptide combinatorial library. We then search the protein database for closely matching amino acid sequences. Previously published attempts to identify unknown antibody targets in this manner have largely been unsuccessful for two reasons: 1) short predicted epitopes yield too many irrelevant matches from a database search and 2) the epitopes may not accurately represent the native antigen with sufficient fidelity. Using an in silico model, we demonstrate the critical threshold requirements for epitope length and epitope fidelity. We find that epitopes generally need to have at least seven amino acids, with an overall accuracy of >70% to the native protein, in order to correctly identify the protein in a nonredundant protein database search. We then confirmed these findings experimentally, using the predicted epitopes for four monoclonal antibodies. Since many predicted epitopes often fail to achieve the seven amino acid threshold, we demonstrate the efficacy of paired epitope searches. This is the first systematic analysis of the computational framework to make this approach viable, coupled with experimental validation.
我们描述了一种鉴定参与疾病发病机制的蛋白质的新方法。该技术,即表位介导的抗原预测(E-MAP),利用患者对疾病相关靶点的免疫反应的特异性,且无需关于该蛋白质的先验知识。E-MAP将从患者血清中分离出的特异性未知的病理性抗体与其在蛋白质数据库中的同源抗原联系起来。E-MAP过程首先涉及使用肽组合文库重建预测表位。然后我们在蛋白质数据库中搜索紧密匹配的氨基酸序列。以前以这种方式鉴定未知抗体靶点的尝试大多未成功,原因有两个:1)短的预测表位在数据库搜索中产生太多不相关的匹配,2)表位可能无法以足够的保真度准确代表天然抗原。使用计算机模型,我们证明了表位长度和表位保真度的关键阈值要求。我们发现,表位通常需要至少有七个氨基酸,对天然蛋白质的总体准确率>70%,以便在非冗余蛋白质数据库搜索中正确鉴定该蛋白质。然后我们使用四种单克隆抗体的预测表位通过实验证实了这些发现。由于许多预测表位常常未能达到七个氨基酸的阈值,我们证明了配对表位搜索的有效性。这是对使该方法可行的计算框架的首次系统分析,并伴有实验验证。