Lee C E H, Gaëta B, Malming H R, Bain M E, Sewell W A, Collins A M
School of Biotechnology and Biomolecular Sciences, University of New South Wales, Kensington, Australia.
Immunogenetics. 2006 Jan;57(12):917-25. doi: 10.1007/s00251-005-0062-5. Epub 2006 Jan 10.
We have used a bioinformatics approach to evaluate the completeness and functionality of the reported human immunoglobulin heavy-chain IGHD gene repertoire. Using the hidden Markov-model-based iHMMune-align program, 1,080 relatively unmutated heavy-chain sequences were aligned against the reported repertoire. These alignments were compared with alignments to 1,639 more highly mutated sequences. Comparisons of the frequencies of gene utilization in the two databases, and analysis of features of aligned IGHD gene segments, including their length, the frequency with which they appear to mutate, and the frequency with which specific mutations were seen, were used to determine the reliability of alignments to the less commonly seen IGHD genes. Analysis demonstrates that IGHD4-23 and IGHD5-24, which have been reported to be open reading frames of uncertain functionality, are represented in the expressed gene repertoire; however, the functionality of IGHD6-25 must be questioned. Sequence similarities make the unequivocal identification of members of the IGHD1 gene family problematic, although all genes except IGHD1-1401 appear to be functional. On the other hand, reported allelic variants of IGHD2-2 and of the IGHD3 gene family appear to be nonfunctional, very rare, or nonexistent. Analysis also suggests that the reported repertoire is relatively complete, although one new putative polymorphism (IGHD3-10p03) was identified. This study therefore confirms a surprising lack of diversity in the available IGHD gene repertoire, and restriction of the germline sequence databases to the functional set described here will substantially improve the accuracy of IGHD gene alignments and therefore the accuracy of analysis of the V-D-J junction.
我们采用了生物信息学方法来评估已报道的人类免疫球蛋白重链IGHD基因库的完整性和功能。使用基于隐马尔可夫模型的iHMMune-align程序,将1080条相对未突变的重链序列与已报道的基因库进行比对。这些比对结果与1639条高度突变序列的比对结果进行了比较。通过比较两个数据库中基因利用频率,并分析比对的IGHD基因片段的特征,包括其长度、出现突变的频率以及特定突变出现的频率,来确定与较少见的IGHD基因比对的可靠性。分析表明,据报道功能不确定的开放阅读框IGHD4-23和IGHD5-24存在于表达的基因库中;然而,IGHD6-25的功能值得怀疑。序列相似性使得明确鉴定IGHD1基因家族的成员存在问题,尽管除IGHD1-1401外的所有基因似乎都有功能。另一方面,报道的IGHD2-2和IGHD3基因家族的等位基因变体似乎无功能、非常罕见或不存在。分析还表明,尽管鉴定出一个新的假定多态性(IGHD3-10p03),但已报道的基因库相对完整。因此,本研究证实了现有IGHD基因库中令人惊讶的多样性缺乏,将种系序列数据库限制于此描述的功能集将大大提高IGHD基因比对的准确性,从而提高V-D-J连接分析的准确性。