Gelfand I M, Kister A E
Department of Mathematics, Rutgers University, New Brunswick, NJ 08903, USA.
Proc Natl Acad Sci U S A. 1997 Nov 11;94(23):12562-7. doi: 10.1073/pnas.94.23.12562.
Sequences of the variable heavy (VH) and kappa (Vkappa) domains of Ig structures were divided into 21 fragments that correspond to strands, loops, or parts of these structural units of the variable domains. Amino acid sequences of fragments (termed "words") were collected from the 1,172 human heavy and 668 human kappa chains available in the Kabat database. Statistical analysis of words of 17 fragments was performed (fragments that comprise the complementary determining regions' fragments will not be discussed in this paper). The number of different words (those with different residues in at least one position) ranged, for various fragments, from 11 to 75 in the kappa chains, and from 23 to 189 in the heavy chains. The main result of this study is that very few keywords, or main patterns of words, were necessary to describe over 90% of the sequences (no more than two keywords per fragment in the kappa and no more than five per fragment in the heavy chains). No identical keywords were found for different fragments of the variable domains. Keywords of aligned fragments of the VH and Vkappa domains were different in all but two instances. Thus, knowing the keywords, one can determine whether any given small part of a sequence belongs to a heavy or kappa chain and predict its precise localization in the sequence. In addition, by using all of the keywords obtained through analysis of the Kabat database, it was possible to describe completely the sequences of the human VH and Vkappa germ-line segments.
免疫球蛋白(Ig)结构的可变重链(VH)和κ链(Vκ)结构域的序列被分为21个片段,这些片段对应于可变结构域的链、环或这些结构单元的部分。从Kabat数据库中可获得的1172条人类重链和668条人类κ链中收集片段(称为“词”)的氨基酸序列。对17个片段的词进行了统计分析(本文不讨论包含互补决定区片段的片段)。不同词(至少在一个位置上具有不同残基的词)的数量,在κ链的各个片段中,范围为11至75个,在重链中为23至189个。这项研究的主要结果是,只需极少的关键词或主要词模式就能描述超过90%的序列(κ链中每个片段不超过两个关键词,重链中每个片段不超过五个关键词)。在可变结构域的不同片段中未发现相同的关键词。VH和Vκ结构域比对片段的关键词,除了两个实例外,在所有情况下都不同。因此,知道这些关键词,就可以确定序列中任何给定的小部分属于重链还是κ链,并预测其在序列中的精确位置。此外,通过使用通过分析Kabat数据库获得的所有关键词,有可能完整地描述人类VH和Vκ种系片段的序列。