Suppr超能文献

在一个大型的北美队列中对基因进行高分辨率特征描述,揭示了结构和序列多样性的新细节。

High-Resolution Characterization of Genes in a Large North American Cohort Reveals Novel Details of Structural and Sequence Diversity.

机构信息

Programa de Pós-Graduação em Genética, Universidade Federal do Paraná, Curitiba, Brazil.

Department of Neurology, University of California, San Francisco, CA, United States.

出版信息

Front Immunol. 2021 May 7;12:674778. doi: 10.3389/fimmu.2021.674778. eCollection 2021.

Abstract

The ) region is characterized by structural variation and high sequence similarity among genes, imposing technical difficulties for analysis. We undertook the most comprehensive study to date of genetic diversity in a large population sample, applying next-generation sequencing in 2,130 United States European-descendant individuals. Data were analyzed using our custom bioinformatics pipeline specifically designed to address technical obstacles in determining genotypes. Precise gene copy number determination allowed us to identify a set of uncommon gene-content haplotypes accounting for 5.2% of structural variation. In this cohort, is the framework gene that most varies in copy number (6.5% of all individuals). We identified phased high-resolution alleles in large multi-locus insertions and also likely founder haplotypes from which they were deleted. Additionally, we observed 250 alleles at 5-digit resolution, of which 90 have frequencies ≥1%. We found sequence patterns that were consistent with the presence of novel alleles in 398 (18.7%) individuals and contextualized multiple orphan dbSNPs within the complex. We also identified a novel KIR2DL1 variant, Pro151Arg, and demonstrated by molecular dynamics that this substitution is predicted to affect interaction with HLA-C. No previous studies have fully explored the full range of structural and sequence variation of as we present here. We demonstrate that pairing high-throughput sequencing with state-of-art computational tools in a large cohort permits exploration of all aspects of variation including determination of population-level haplotype diversity, improving understanding of the system, and providing an important reference for future studies.

摘要

该区域的特点是基因之间结构变异和序列高度相似,这给分析带来了技术上的困难。我们对一个大型人群样本中的遗传多样性进行了迄今为止最全面的研究,在 2130 名美国欧洲裔个体中应用了下一代测序技术。我们使用专门设计的定制生物信息学管道来分析数据,该管道专门用于解决确定基因型的技术障碍。精确的基因拷贝数确定使我们能够识别出一组罕见的基因含量单倍型,占结构变异的 5.2%。在该队列中,是拷贝数变化最大的框架基因(所有个体的 6.5%)。我们确定了大的多基因插入中的高分辨率相单倍型,也可能确定了它们缺失的原始单倍型。此外,我们在 5 位数字分辨率下观察到 250 个等位基因,其中 90 个的频率≥1%。我们发现了与 398 名(18.7%)个体中存在新等位基因一致的序列模式,并在 复杂结构中对多个孤儿 dbSNP 进行了上下文分析。我们还发现了一种新的 KIR2DL1 变体 Pro151Arg,并通过分子动力学证明该取代预测会影响与 HLA-C 的相互作用。以前的研究都没有像我们在这里展示的那样全面探索 的全部结构和序列变异。我们证明,在一个大型队列中,将高通量测序与最先进的计算工具相结合,可以探索 的所有方面的变异,包括确定群体水平的单倍型多样性,从而更好地理解 系统,并为未来的研究提供重要参考。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ce/8137979/10a971b7a873/fimmu-12-674778-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验