R&D Informatics, Centocor Discovery Research, San Diego, CA 92121, USA.
Mol Immunol. 2010 Jan;47(4):694-700. doi: 10.1016/j.molimm.2009.10.028. Epub 2009 Nov 24.
Determination of framework regions (FRs) and complementarity determining regions (CDRs) in an antibody is essential for understanding the underlying biology as well as antibody engineering and optimization. However, there are no computational algorithms available to delimit an antibody sequence or a library of sequences into FRs and CDRs in a coherent and automatic fashion. Based upon the mapping relationships among mature antibody sequences and their corresponding germline gene segments, a novel computational algorithm has been developed for automatic determination of CDRs. Even though a human can make more than 10(12) different antibody molecules in its preimmune repertoire to fight off invading pathogens, these antibodies are generated from rearrangements of a very limited number of germline variable (V) gene, diversity (D) gene and joining (J) gene segments followed by somatic hypermutation. The framework regions FR1, FR2 and FR3 in mature antibodies are encoded by germline V gene segments, while FR4 is encoded by J gene segments. Since there are only a limited number of germline gene segments, these genes can be pre-delimited to generate a knowledge base of FRs and CDRs. Then for a given antibody sequence, the algorithm scans each pre-delimited gene in knowledge base, finds the best matching V and J segments, and accordingly, identifies the FRs and CDRs. The described algorithm is stringently tested using nearly 25,000 human antibody sequences from NCBI, and it is proven to be very robust. Over 99.7% of antibody sequences can be delimited computationally. Of those delimited sequences, only 0.28% of them have somatic insertions and deletions in FRs, and their corresponding delimited results need manual checking. Another feature of the algorithm is that it is CDR definition independent, and can be easily extended to other CDR definitions besides the most widely used Kabat, Chothia and IMGT definitions. In addition to delimitation of antibody sequences into FRs and CDRs, the described algorithm is good for sequence annotation and sequence quality control by detecting unusual sequence patterns and features. Furthermore, it has been suggested that the algorithm may easily be embedded into other applications, such as to create a gene family specific PSSM (Position Specific Scoring Matrix) for antibody engineering, and to automatically number an antibody sequence.
确定抗体的框架区(FR)和互补决定区(CDR)对于理解其基础生物学以及抗体工程和优化至关重要。然而,目前尚无可用的计算算法可以将抗体序列或序列库连贯且自动地划分为 FR 和 CDR。基于成熟抗体序列与其相应的胚系基因片段之间的映射关系,开发了一种新的计算算法,用于自动确定 CDR。尽管人类在其天然免疫库中可以产生超过 10 的 12 次方不同的抗体分子来抵御入侵的病原体,但这些抗体是由非常有限数量的胚系可变(V)基因、多样性(D)基因和连接(J)基因片段的重排以及体细胞超突变产生的。成熟抗体中的 FR1、FR2 和 FR3 由胚系 V 基因片段编码,而 FR4 由 J 基因片段编码。由于胚系基因片段数量有限,因此可以预先限定这些基因,以生成 FR 和 CDR 的知识库。然后,对于给定的抗体序列,该算法会扫描知识库中的每个预先限定的基因,找到最佳匹配的 V 和 J 片段,并相应地确定 FR 和 CDR。该描述的算法使用来自 NCBI 的近 25000 个人类抗体序列进行了严格测试,证明其非常稳健。超过 99.7%的抗体序列可以通过计算进行限定。在限定的序列中,只有 0.28%的 FR 中存在体细胞插入和缺失,并且需要手动检查其对应的限定结果。该算法的另一个特点是它的 CDR 定义独立,可以轻松扩展到除最广泛使用的 Kabat、Chothia 和 IMGT 定义之外的其他 CDR 定义。除了将抗体序列划分为 FR 和 CDR 之外,该算法还可用于通过检测异常序列模式和特征来进行序列注释和序列质量控制。此外,有人建议该算法可以很容易地嵌入到其他应用程序中,例如为抗体工程创建特定基因家族的 PSSM(位置特异性评分矩阵),以及自动为抗体序列编号。