长读长测序揭示了主要组织相容性复合体区域的新型基因多态性及其对汉族人群的影响。
Long-read sequencing reveals novel genetic polymorphisms in the major histocompatibility complex region and their impacts on the Han Chinese population.
作者信息
Zhou Cong, Gong Tingting, Li Shuhang, Jin Li, Fan Shaohua
机构信息
State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, 200438, China.
Research Unit of Dissecting the Population Genetics and Developing New Technologies for Treatment and Prevention of Skin Phenotypes and Dermatological Diseases (2019RU058), Chinese Academy of Medical Sciences, Shanghai, 210042, China.
出版信息
Sci China Life Sci. 2025 May;68(5):1400-1409. doi: 10.1007/s11427-024-2742-y. Epub 2025 Jan 13.
Human leukocyte antigen (HLA) genes in the major histocompatibility complex (MHC) region are crucial for immunity and are associated with numerous diseases and phenotypes. The MHC region's complexity and high genetic diversity make it challenging to analyze using short-read sequencing (SRS) technology. We sequence the MHC region of 100 Han Chinese individuals using both long-read sequencing (LRS) and SRS platforms at approximately 30X coverage to study genetic alterations and their potential functional impacts. LRS provides significantly greater coverage of the MHC region and eight classical HLA genes, particularly at the HLA-DRB1 locus, compared with SRS. We detect 78,249 single nucleotide polymorphisms (SNPs) using LRS, with 26.0% undetectable by SRS. Based on SNP and inferred HLA allele types, we construct an LRS-based MHC reference panel for the Han Chinese, containing approximately 2.6 times more genetic variants than the SRS-based Han-MHC reference panel. A phenome-wide association study assessing 26,024 phenotypes across 15 categories identifies significant associations for 7,879 independent variants (including 809 LRS-specific SNPs) with 409 phenotypes in nine categories. This analysis reveals 24 unreported HLA allele associations in the bioelectric and cellular categories. The conditional analysis identifies 530 independent signals across the 409 phenotypes, including 28 previously unreported signals of eight classical HLA genes associated with 33 phenotypes. Of the top-associated SNPs, 191 are detected by LRS only. Fine-mapping identifies 126 independent candidate causal SNPs for three immune-related cellular phenotypes, with 17 detected exclusively by LRS. Our study reveals previously unreported variants and their functional impacts in the MHC region, enhancing our understanding of genetic diversity and its potential biological implications in the Han Chinese population.
主要组织相容性复合体(MHC)区域中的人类白细胞抗原(HLA)基因对免疫至关重要,且与众多疾病和表型相关。MHC区域的复杂性和高遗传多样性使得使用短读长测序(SRS)技术进行分析具有挑战性。我们使用长读长测序(LRS)和SRS平台对100名汉族个体的MHC区域进行测序,覆盖度约为30X,以研究基因改变及其潜在的功能影响。与SRS相比,LRS对MHC区域和八个经典HLA基因的覆盖度显著更高,尤其是在HLA-DRB1位点。我们使用LRS检测到78,249个单核苷酸多态性(SNP),其中26.0%无法被SRS检测到。基于SNP和推断的HLA等位基因类型,我们构建了一个基于LRS的汉族MHC参考面板,其包含的遗传变异比基于SRS的汉族-MHC参考面板多约2.6倍。一项全表型关联研究评估了15个类别的26,024种表型,确定了7,879个独立变异(包括809个LRS特异性SNP)与九个类别中的409种表型存在显著关联。该分析揭示了生物电和细胞类别中24种未报告的HLA等位基因关联。条件分析在409种表型中识别出530个独立信号,包括与33种表型相关的八个经典HLA基因的28个先前未报告的信号。在最相关的SNP中,191个仅被LRS检测到。精细定位确定了三种免疫相关细胞表型的126个独立候选因果SNP,其中17个仅被LRS检测到。我们的研究揭示了MHC区域中先前未报告的变异及其功能影响,增进了我们对汉族人群遗传多样性及其潜在生物学意义的理解。