Boutz Daniel R, Horton Andrew P, Wine Yariv, Lavinder Jason J, Georgiou George, Marcotte Edward M
Center for Systems & Synthetic Biology, †Institute for Cellular and Molecular Biology, ⊥Department of Biomedical Engineering, §Department of Chemical Engineering, and ∥Department of Molecular Biosciences, University of Texas at Austin , Austin, Texas 78712, United States.
Anal Chem. 2014 May 20;86(10):4758-66. doi: 10.1021/ac4037679. Epub 2014 May 1.
Characterizing the in vivo dynamics of the polyclonal antibody repertoire in serum, such as that which might arise in response to stimulation with an antigen, is difficult due to the presence of many highly similar immunoglobulin proteins, each specified by distinct B lymphocytes. These challenges have precluded the use of conventional mass spectrometry for antibody identification based on peptide mass spectral matches to a genomic reference database. Recently, progress has been made using bottom-up analysis of serum antibodies by nanoflow liquid chromatography/high-resolution tandem mass spectrometry combined with a sample-specific antibody sequence database generated by high-throughput sequencing of individual B cell immunoglobulin variable domains (V genes). Here, we describe how intrinsic features of antibody primary structure, most notably the interspersed segments of variable and conserved amino acid sequences, generate recurring patterns in the corresponding peptide mass spectra of V gene peptides, greatly complicating the assignment of correct sequences to mass spectral data. We show that the standard method of decoy-based error modeling fails to account for the error introduced by these highly similar sequences, leading to a significant underestimation of the false discovery rate. Because of these effects, antibody-derived peptide mass spectra require increased stringency in their interpretation. The use of filters based on the mean precursor ion mass accuracy of peptide-spectrum matches is shown to be particularly effective in distinguishing between "true" and "false" identifications. These findings highlight important caveats associated with the use of standard database search and error-modeling methods with nonstandard data sets and custom sequence databases.
由于存在许多高度相似的免疫球蛋白蛋白,每种蛋白由不同的B淋巴细胞指定,因此表征血清中多克隆抗体库的体内动态变化(例如对抗原刺激可能产生的动态变化)具有一定难度。这些挑战使得基于肽质谱与基因组参考数据库匹配来鉴定抗体的传统质谱方法无法使用。最近,通过纳流液相色谱/高分辨率串联质谱对血清抗体进行自下而上的分析,并结合通过对单个B细胞免疫球蛋白可变区(V基因)进行高通量测序生成的样本特异性抗体序列数据库,取得了一定进展。在此,我们描述了抗体一级结构的内在特征,最显著的是可变氨基酸序列和保守氨基酸序列的穿插片段,如何在V基因肽段的相应肽质谱中产生重复模式,这使得将正确序列分配给质谱数据变得极为复杂。我们表明,基于诱饵的错误建模标准方法无法考虑这些高度相似序列引入的错误,导致对错误发现率的显著低估。由于这些影响,抗体衍生的肽质谱在解释时需要更高的严格性。基于肽谱匹配的平均前体离子质量准确度使用过滤器被证明在区分“真”和“假”鉴定方面特别有效。这些发现突出了与使用标准数据库搜索和错误建模方法处理非标准数据集和定制序列数据库相关的重要注意事项。