Suppr超能文献

95个人类单倍型序列揭示了除高度多态性的[基因名称1]和[基因名称2]之外的其他基因中的极端编码变异。

Sequences of 95 human haplotypes reveal extreme coding variation in genes other than highly polymorphic and .

作者信息

Norman Paul J, Norberg Steven J, Guethlein Lisbeth A, Nemat-Gorgani Neda, Royce Thomas, Wroblewski Emily E, Dunn Tamsen, Mann Tobias, Alicata Claudia, Hollenbach Jill A, Chang Weihua, Shults Won Melissa, Gunderson Kevin L, Abi-Rached Laurent, Ronaghi Mostafa, Parham Peter

机构信息

Departments of Structural Biology and Microbiology & Immunology, Stanford University School of Medicine, Stanford, California 94305, USA.

Illumina Incorporated, San Diego, California 92122, USA.

出版信息

Genome Res. 2017 May;27(5):813-823. doi: 10.1101/gr.213538.116. Epub 2017 Mar 30.

Abstract

The most polymorphic part of the human genome, the encodes over 160 proteins of diverse function. Half of them, including the and genes, are directly involved in immune responses. Consequently, the region strongly associates with numerous diseases and clinical therapies. Notoriously, the region has been intractable to high-throughput analysis at complete sequence resolution, and current reference haplotypes are inadequate for large-scale studies. To address these challenges, we developed a method that specifically captures and sequences the 4.8-Mbp region from genomic DNA. For 95 homozygous cell lines we assembled, de novo, a set of high-fidelity contigs and a sequence scaffold, representing a mean 98% of the target region. Included are six alternative reference sequences of the human genome that we completed and refined. Characterization of the sequence and structural diversity of the region shows the approach accurately determines the sequences of the highly polymorphic and genes and the complex structural diversity of complement factor It has also uncovered extensive and unexpected diversity in other genes; an example is , which encodes a lung mucin and exhibits more coding sequence alleles than any or gene studied here. More than 60% of the coding sequence alleles analyzed were previously uncharacterized. We have created a substantial database of robust reference haplotype sequences that will enable future population scale studies of this complicated and clinically important region of the human genome.

摘要

人类基因组中多态性最高的部分,编码了160多种功能各异的蛋白质。其中一半,包括和基因,直接参与免疫反应。因此,该区域与众多疾病和临床治疗密切相关。众所周知,该区域在完整序列分辨率下难以进行高通量分析,目前的参考单倍型不足以进行大规模研究。为应对这些挑战,我们开发了一种方法,可从基因组DNA中特异性捕获并测序480万碱基对的区域。对于我们组装的95个纯合细胞系,我们从头组装了一组高保真重叠群和一个序列支架,覆盖了目标区域的平均98%。其中包括我们完成并完善的人类基因组的六个替代参考序列。对该区域的序列和结构多样性进行表征表明,该方法能够准确确定高度多态的和基因的序列以及补体因子的复杂结构多样性。它还揭示了其他基因中广泛且意想不到的多样性;例如,它编码一种肺粘蛋白,其编码序列等位基因比这里研究的任何或基因都多。分析的编码序列等位基因中超过60%以前未被表征。我们创建了一个丰富的可靠参考单倍型序列数据库,这将使未来能够对人类基因组中这个复杂且具有临床重要性的区域进行群体规模研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43bc/5411776/52e50c3dff26/813f01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验