Wang Yan, Jackson Katherine J L, Sewell William A, Collins Andrew M
School of Biotechnology and Biomolecular Sciences, University of New South Wales, Kensington, New South Wales, Australia.
Immunol Cell Biol. 2008 Feb;86(2):111-5. doi: 10.1038/sj.icb.7100144. Epub 2007 Nov 27.
The identification of the genes that make up rearranged immunoglobulin genes is critical to many studies. For example, the enumeration of mutations in immunoglobulin genes is important for the prognosis of chronic lymphocytic leukemia, and this requires the accurate identification of the germline genes from which a particular sequence is derived. The immunoglobulin heavy-chain variable (IGHV) gene repertoire is generally considered to be highly polymorphic. In this report, we describe a bioinformatic analysis of germline and rearranged immunoglobulin gene sequences which casts doubt on the existence of a substantial proportion of reported germline polymorphisms. We report a five-level classification system for IGHV genes, which indicates the likelihood that the genes have been reported accurately. The classification scheme also reflects the likelihood that germline genes could be incorrectly identified in mutated VDJ rearrangements, because of similarities to other alleles. Of the 226 IGHV alleles that have previously been reported, our analysis suggests that 104 of these alleles almost certainly include sequence errors, and should be removed from the available repertoire. The analysis also highlights the presence of common mismatches, with respect to the germline, in many rearranged heavy-chain sequences, suggesting the existence of twelve previously unreported alleles. Sequencing of IGHV genes from six individuals in this study confirmed the existence of three of these alleles, which we designate IGHV3-4904, IGHV3-4905 and IGHV4-39*07. We therefore present a revised repertoire of expressed IGHV genes, which should substantially improve the accuracy of immunoglobulin gene analysis.
鉴定构成重排免疫球蛋白基因的基因对许多研究至关重要。例如,免疫球蛋白基因突变的计数对慢性淋巴细胞白血病的预后很重要,而这需要准确鉴定特定序列所源自的种系基因。免疫球蛋白重链可变区(IGHV)基因库通常被认为具有高度多态性。在本报告中,我们描述了一项对种系和重排免疫球蛋白基因序列的生物信息学分析,该分析对大量已报道的种系多态性的存在提出了质疑。我们报告了一种针对IGHV基因的五级分类系统,该系统表明了基因被准确报道的可能性。该分类方案还反映了由于与其他等位基因相似,在突变的VDJ重排中种系基因可能被错误识别的可能性。在先前报道的226个IGHV等位基因中,我们的分析表明其中104个等位基因几乎肯定包含序列错误,应从现有基因库中剔除。该分析还突出了许多重排重链序列中相对于种系存在的常见错配,这表明存在12个先前未报道的等位基因。本研究中对6个人的IGHV基因测序证实了其中3个等位基因的存在,我们将其命名为IGHV3 - 4904、IGHV3 - 4905和IGHV4 - 39*07。因此,我们提出了一个修订后的表达IGHV基因库,这将大大提高免疫球蛋白基因分析的准确性。