Suppr超能文献

千人基因组数据集中的人类白细胞抗原(HLA)多样性。

HLA diversity in the 1000 genomes dataset.

作者信息

Gourraud Pierre-Antoine, Khankhanian Pouya, Cereb Nezih, Yang Soo Young, Feolo Michael, Maiers Martin, Rioux John D, Hauser Stephen, Oksenberg Jorge

机构信息

Department of Neurology, University of California San Francisco, San Francisco, California, United States of America.

Histogenetics Inc., Ossining, New York, United States of America.

出版信息

PLoS One. 2014 Jul 2;9(7):e97282. doi: 10.1371/journal.pone.0097282. eCollection 2014.

Abstract

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC), only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD) decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies.

摘要

千人基因组计划旨在通过测序深度表征人类基因组序列变异,测序深度应能实现全基因组范围内检测频率低至1%的大多数变异。然而,在主要组织相容性复合体(MHC)中,只有前10种最常见的单倍型处于1%的频率范围内,而数千种单倍型的频率更低。鉴于千人基因组计划所产生序列的覆盖范围和读长的限制,定义HLA等位基因的高度可变位置可能难以识别。我们使用经典的桑格测序技术对现有千人基因组样本中的HLA - A、HLA - B、HLA - C、HLA - DRB1和HLA - DQB1基因进行分型,并将结果与千人基因组计划对MHC区域基因分型的103310个变异相结合。利用个体间按血统相同的成对距离和主成分分析,我们确定了MHC区域祖先与遗传多样性之间的关系。正如预期的那样,MHC变异和HLA表型都能识别主要的祖先谱系,主要由最常见的HLA单倍型决定。在一定程度上,具有相似遗传或相似重组率的基因组区域具有相似的特性。以MHC为中心的分析突显了MHC的祖先背景与全基因组情况之间的差异。我们对这些样本中连锁不平衡(LD)衰减的分析表明,由于对MHC多样性的抽样有限,成对LD存在高估现象。这个可在dbMHC门户网站上获取的HLA特异性MHC变异集合,是未来分析MHC在人群和疾病研究中作用的宝贵资源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0288/4079705/8853e2e6d0e2/pone.0097282.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验