Suppr超能文献

近乎完整的人类基因组中的复杂遗传变异。

Complex genetic variation in nearly complete human genomes.

作者信息

Logsdon Glennis A, Ebert Peter, Audano Peter A, Loftus Mark, Porubsky David, Ebler Jana, Yilmaz Feyza, Hallast Pille, Prodanov Timofey, Yoo DongAhn, Paisie Carolyn A, Harvey William T, Zhao Xuefang, Martino Gianni V, Henglin Mir, Munson Katherine M, Rabbani Keon, Chin Chen-Shan, Gu Bida, Ashraf Hufsah, Austine-Orimoloye Olanrewaju, Balachandran Parithi, Bonder Marc Jan, Cheng Haoyu, Chong Zechen, Crabtree Jonathan, Gerstein Mark, Guethlein Lisbeth A, Hasenfeld Patrick, Hickey Glenn, Hoekzema Kendra, Hunt Sarah E, Jensen Matthew, Jiang Yunzhe, Koren Sergey, Kwon Youngjun, Li Chong, Li Heng, Li Jiaqi, Norman Paul J, Oshima Keisuke K, Paten Benedict, Phillippy Adam M, Pollock Nicholas R, Rausch Tobias, Rautiainen Mikko, Scholz Stephan, Song Yuwei, Söylev Arda, Sulovari Arvis, Surapaneni Likhitha, Tsapalou Vasiliki, Zhou Weichen, Zhou Ying, Zhu Qihui, Zody Michael C, Mills Ryan E, Devine Scott E, Shi Xinghua, Talkowski Mike E, Chaisson Mark J P, Dilthey Alexander T, Konkel Miriam K, Korbel Jan O, Lee Charles, Beck Christine R, Eichler Evan E, Marschall Tobias

机构信息

Perelman School of Medicine, University of Pennsylvania, Department of Genetics, Epigenetics Institute, Philadelphia, PA, USA.

Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.

出版信息

bioRxiv. 2024 Sep 25:2024.09.24.614721. doi: 10.1101/2024.09.24.614721.

Abstract

Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here, we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (130 Mbp median continuity), closing 92% of all previous assembly gaps and reaching telomere-to-telomere (T2T) status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), /, , and , and fully resolve 1,852 complex structural variants (SVs). In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite high-order repeat (HOR) array length and characterize the pattern of mobile element insertions into α-satellite HOR arrays. While most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference to a median quality value (QV) of 45. Using this approach, 26,115 SVs per sample are detected, substantially increasing the number of SVs now amenable to downstream disease association studies.

摘要

构建泛基因组参考图谱并了解复杂结构变异的程度需要多种完整的人类基因组。在此,我们对65个不同的人类基因组进行了测序,并构建了130个单倍型解析的基因组组装体(中位连续性为130 Mbp),填补了之前所有组装缺口的92%,39%的染色体达到了端粒到端粒(T2T)状态。我们强调了复杂基因座的完整序列连续性,包括主要组织相容性复合体(MHC)、/、、和,并完全解析了1852个复杂结构变异(SV)。此外,我们完全组装并验证了1246个人类着丝粒。我们发现α卫星高阶重复(HOR)阵列长度存在高达30倍的差异,并描述了移动元件插入α卫星HOR阵列的模式。虽然大多数着丝粒预测了单个动粒附着位点,但表观遗传分析表明7%的着丝粒存在两个低甲基化区域。将我们的数据与泛基因组参考草图相结合,显著提高了基于短读长数据的基因分型准确性,使全基因组推断的中位质量值(QV)达到45。使用这种方法,每个样本检测到26115个SV,大大增加了目前适用于下游疾病关联研究的SV数量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42d3/11451754/de547f66c11f/nihpp-2024.09.24.614721v1-f0007.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验