Department of Chemistry, Vanderbilt University, Nashville, TN, USA.
Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA.
MAbs. 2020 Jan-Dec;12(1):1758291. doi: 10.1080/19420862.2020.1758291.
The antibody (Ab) germline gene rearrangement of variable (V), diversity (D), and joining (J) gene segments, as well as somatic hypermutation, give rise to the human Ab variable gene sequence repertoire. It is common to characterize single nucleotide frequencies of the variable region by alignment to species-specific wildtype germline genes. The increasing application of next-generation sequencing to immune repertoire studies has led to the compilation of increasing large adaptive immunome receptor repertoire datasets. We have developed a method that maps the sequence of a target Ab onto an immunome dataset of 326 million human Ab sequences. For this purpose, we created a position- and gene-specific scoring matrix (PGSSM) and its corresponding antibody similarity score. We characterized our PGSSM score and found that it strongly correlated with the phylogenetic distance of 181,355 Ab sequences from GenBank across 20 species. The most likely human nucleotide back-translation was obtained given only PGSSMs and the amino acid sequence of an Ab achieving a nucleotide sequence recovery of 95.9% and 97.2% for human heavy and light chains, respectively. In conclusion, the scoring of our back-translation is a valuable estimate for the similarity of an Ab sequence to the natural human repertoire. As expected, Ab therapeutic molecules developed from a human source showed a higher similarity to the repertoire than engineered Abs. Thus, the PGSSM metric introduced here can be used to engineer human-like Ab therapeutics.
抗体 (Ab) 可变 (V)、多样性 (D) 和连接 (J) 基因片段的胚系基因重排,以及体细胞高频突变,产生了人类 Ab 可变基因序列库。通常通过与物种特异性野生型胚系基因的比对来描述可变区的单核苷酸频率。下一代测序在免疫受体库研究中的应用越来越广泛,导致越来越多的大型适应性免疫受体库数据集被编译。我们开发了一种方法,可将目标 Ab 的序列映射到 3.26 亿个人类 Ab 序列的免疫组库数据集中。为此,我们创建了一个位置和基因特异性评分矩阵 (PGSSM) 及其对应的抗体相似性评分。我们对 PGSSM 评分进行了特征描述,发现它与来自 20 个物种的 GenBank 中 181,355 个 Ab 序列的系统发育距离具有很强的相关性。仅使用 PGSSMs 和 Ab 的氨基酸序列,就可以获得目标 Ab 的最可能的人类核苷酸反向翻译,从而实现人类重链和轻链核苷酸序列恢复的分别为 95.9%和 97.2%。总之,我们的反向翻译评分是 Ab 序列与天然人类库相似性的有价值估计。正如预期的那样,源自人类的 Ab 治疗分子与库的相似度高于工程 Ab。因此,这里引入的 PGSSM 指标可用于设计类人 Ab 治疗药物。