Zhang Bochao, Meng Wenzhao, Prak Eline T Luning, Hershberg Uri
School of Biomedical Engineering, Science and Health Systems, 711 Bossone Building, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA.
Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, 405B Stellar Chance Labs, 422 Curie Boulevard, Philadelphia, PA 19104, USA.
J Immunol Methods. 2015 Dec;427:105-16. doi: 10.1016/j.jim.2015.10.009. Epub 2015 Nov 1.
Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences.
免疫组库是表达由可变(V)基因片段、(重链情况下的多样性(D)基因片段)和连接(J)基因片段组成的多样化抗原受体基因重排的淋巴细胞集合。克隆相关的细胞通常共享相同的种系基因片段,并且在其第三个互补决定区内具有高度相似的连接序列。识别序列的克隆相关性是免疫组库分析中的关键步骤。V基因对于克隆鉴定最为重要,因为它具有最长的序列和最多的序列变体。然而,准确鉴定克隆的种系V基因来源具有挑战性,因为不同的种系V基因之间存在高度相似性。在可发生体细胞超突变的抗体中,这一困难更加复杂。此外,高通量测序实验通常会产生部分序列,并且具有显著的错误率。为了解决这些问题,我们描述了一种新方法,用于估计在不同条件下(读长、测序错误或体细胞超突变频率)哪些种系V基因(或等位基因)无法区分。从任何一组种系V基因开始,该方法使用不同的测序长度测量它们的相似性,并计算它们在不同突变水平下明确分配的可能性。因此,在不同的实验和生物学条件下,可以识别出无法唯一鉴定的种系V基因(或等位基因),并将它们捆绑在一起,形成具有高度相似序列的特定V基因组。