Stewart C Andrew, Horton Roger, Allcock Richard J N, Ashurst Jennifer L, Atrazhev Alexey M, Coggill Penny, Dunham Ian, Forbes Simon, Halls Karen, Howson Joanna M M, Humphray Sean J, Hunt Sarah, Mungall Andrew J, Osoegawa Kazutoyo, Palmer Sophie, Roberts Anne N, Rogers Jane, Sims Sarah, Wang Yu, Wilming Laurens G, Elliott John F, de Jong Pieter J, Sawcer Stephen, Todd John A, Trowsdale John, Beck Stephan
Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
Genome Res. 2004 Jun;14(6):1176-87. doi: 10.1101/gr.2188104. Epub 2004 May 12.
The future systematic mapping of variants that confer susceptibility to common diseases requires the construction of a fully informative polymorphism map. Ideally, every base pair of the genome would be sequenced in many individuals. Here, we report 4.75 Mb of contiguous sequence for each of two common haplotypes of the major histocompatibility complex (MHC), to which susceptibility to >100 diseases has been mapped. The autoimmune disease-associated-haplotypes HLA-A3-B7-Cw7-DR15 and HLA-A1-B8-Cw7-DR3 were sequenced in their entirety through a bacterial artificial chromosome (BAC) cloning strategy using the consanguineous cell lines PGF and COX, respectively. The two sequences were annotated to encompass all described splice variants of expressed genes. We defined the complete variation content of the two haplotypes, revealing >18,000 variations between them. Average SNP densities ranged from less than one SNP per kilobase to >60. Acquisition of complete and accurate sequence data over polymorphic regions such as the MHC from large-insert cloned DNA provides a definitive resource for the construction of informative genetic maps, and avoids the limitation of chromosome regions that are refractory to PCR amplification.
未来对导致常见疾病易感性的变异进行系统定位需要构建一个信息全面的多态性图谱。理想情况下,基因组的每一个碱基对都要在众多个体中进行测序。在此,我们报告了主要组织相容性复合体(MHC)两种常见单倍型各自4.75 Mb的连续序列,超过100种疾病的易感性都已定位到该复合体上。分别使用近亲细胞系PGF和COX,通过细菌人工染色体(BAC)克隆策略对自身免疫性疾病相关单倍型HLA - A3 - B7 - Cw7 - DR15和HLA - A1 - B8 - Cw7 - DR3进行了全序列测定。这两个序列经过注释,涵盖了所有已描述的表达基因的剪接变体。我们确定了这两种单倍型的完整变异内容,发现它们之间存在超过18,000个变异。平均单核苷酸多态性(SNP)密度范围从每千碱基少于1个SNP到超过60个SNP。从大插入片段克隆DNA获取如MHC这样的多态性区域的完整准确序列数据,为构建信息丰富的遗传图谱提供了一个确定的资源,并且避免了PCR扩增难以处理的染色体区域的限制。