• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

分相基因组组装中的缺口和复杂结构变异位点。

Gaps and complex structurally variant loci in phased genome assemblies.

机构信息

Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.

Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany.

出版信息

Genome Res. 2023 Apr;33(4):496-510. doi: 10.1101/gr.277334.122. Epub 2023 May 10.

DOI:10.1101/gr.277334.122
PMID:37164484
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10234299/
Abstract

There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.

摘要

通过将长读数据与亲本信息或链接读数据相结合,在分阶段基因组组装生产方面取得了巨大进展。然而,通过 trio-hifiasm 生成的典型相位基因组仍然会产生超过 140 个缺口。我们对来自 77 个独特人类样本多样性面板的 182 个单体组装获得的缺口、组装断裂和定向错误进行了详细分析。尽管基于 trio 的使用 HiFi 的方法是当前的黄金标准,但使用 Strand-seq 而不是亲本数据时,染色体级别的相位准确性相当。重要的是,大多数组装缺口聚集在最大和最相似的重复序列附近(包括片段重复[35.4%]、卫星 DNA [22.3%]或富含 GA/AT 丰富 DNA 的区域[27.4%])。因此,至少有 1513 个蛋白编码基因在至少一个单体型中重叠组装缺口,并且有 231 个基因经常从五个或更多单体型中断裂或缺失。此外,我们估计每个单体型有 6-7 Mbp 的 DNA 定向错误,无论是否使用无 trio 或基于 trio 的方法。在这些定向错误中,81%对应于人类物种中真正的大型倒位多态性,其中大多数被大片段重复序列包围。我们还确定了与每个单体基因组 11.9 Mbp 的缺失和 161.4 Mbp 的插入相一致的大规模对齐不连续性。尽管这种变异的 99%对应于卫星 DNA,但我们鉴定出 230 个常染色质 DNA 区域具有频繁的扩展和收缩,其中近一半与 197 个蛋白编码基因重叠。这种可变的和不完全组装的区域是未来算法开发和泛基因组表示的重要目标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d5e/10234299/2e8dbd0c2e6d/496f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d5e/10234299/c7e957330046/496f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d5e/10234299/380107f12c38/496f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d5e/10234299/a0d7a5914372/496f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d5e/10234299/bb36888c0726/496f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d5e/10234299/2e8dbd0c2e6d/496f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d5e/10234299/c7e957330046/496f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d5e/10234299/380107f12c38/496f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d5e/10234299/a0d7a5914372/496f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d5e/10234299/bb36888c0726/496f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d5e/10234299/2e8dbd0c2e6d/496f05.jpg

相似文献

1
Gaps and complex structurally variant loci in phased genome assemblies.分相基因组组装中的缺口和复杂结构变异位点。
Genome Res. 2023 Apr;33(4):496-510. doi: 10.1101/gr.277334.122. Epub 2023 May 10.
2
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA?一项初步评估。
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.
3
Beckwith-Wiedemann Syndrome贝克威思-维德曼综合征
4
Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。
Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.
5
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.
6
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
7
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
8
Interventions for promoting habitual exercise in people living with and beyond cancer.促进癌症患者及康复者进行习惯性锻炼的干预措施。
Cochrane Database Syst Rev. 2018 Sep 19;9(9):CD010192. doi: 10.1002/14651858.CD010192.pub3.
9
Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理:一项网络荟萃分析。
Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.
10
Short-Term Memory Impairment短期记忆障碍

引用本文的文献

1
Segmental duplication-mediated rearrangements alter the landscape of mouse genomes.节段性重复介导的重排改变了小鼠基因组的格局。
bioRxiv. 2025 Jul 22:2025.07.18.665526. doi: 10.1101/2025.07.18.665526.
2
Complex genetic variation in nearly complete human genomes.近乎完整的人类基因组中的复杂遗传变异。
Nature. 2025 Jul 23. doi: 10.1038/s41586-025-09140-6.
3
Genetic variation in recalcitrant repetitive regions of the genome.基因组难处理的重复区域中的遗传变异。

本文引用的文献

1
A draft human pangenome reference.人类泛基因组参考草图。
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.
2
Recombination between heterologous human acrocentric chromosomes.异源人类近端着丝粒染色体之间的重组。
Nature. 2023 May;617(7960):335-343. doi: 10.1038/s41586-023-05976-y. Epub 2023 May 10.
3
Increased mutation and gene conversion within human segmental duplications.人类片段重复序列中突变和基因转换的增加。
Genome Res. 2025 Aug 5. doi: 10.1101/gr.280728.125.
4
Accurate short-read alignment through -index-based pangenome indexing.通过基于索引的泛基因组索引实现准确的短读比对。
Genome Res. 2025 Jul 1;35(7):1609-1620. doi: 10.1101/gr.279858.124.
5
Sequencing the gaps: dark genomic regions persist in CHM13 despite long-read advances.填补空白:尽管长读长测序技术取得了进展,但CHM13基因组中的暗区仍然存在。
bioRxiv. 2025 May 28:2025.05.23.655776. doi: 10.1101/2025.05.23.655776.
6
Genetic diversity and regulatory features of human-specific duplications.人类特异性重复序列的遗传多样性和调控特征
bioRxiv. 2025 Mar 17:2025.03.14.643395. doi: 10.1101/2025.03.14.643395.
7
Unraveling undiagnosed rare disease cases by HiFi long-read genome sequencing.通过高保真长读长基因组测序解析未确诊的罕见病病例。
Genome Res. 2025 Apr 14;35(4):755-768. doi: 10.1101/gr.279414.124.
8
Genome-wide profiling of highly similar paralogous genes using HiFi sequencing.使用高保真测序对高度相似的旁系同源基因进行全基因组分析。
Nat Commun. 2025 Mar 8;16(1):2340. doi: 10.1038/s41467-025-57505-2.
9
Structural variation, selection, and diversification of the gene family from the human pangenome.人类泛基因组中基因家族的结构变异、选择与多样化
bioRxiv. 2025 Feb 5:2025.02.04.636496. doi: 10.1101/2025.02.04.636496.
10
A refined analysis of Neanderthal-introgressed sequences in modern humans with a complete reference genome.利用完整参考基因组对现代人类中尼安德特人渗入序列进行的精细分析。
Genome Biol. 2025 Feb 17;26(1):32. doi: 10.1186/s13059-025-03502-z.
Nature. 2023 May;617(7960):325-334. doi: 10.1038/s41586-023-05895-y. Epub 2023 May 10.
4
Telomere-to-telomere assembly of diploid chromosomes with Verkko.利用 Verkko 进行二倍体染色体的端粒到端粒组装。
Nat Biotechnol. 2023 Oct;41(10):1474-1482. doi: 10.1038/s41587-023-01662-6. Epub 2023 Feb 16.
5
Semi-automated assembly of high-quality diploid human reference genomes.半自动组装高质量的二倍体人类参考基因组。
Nature. 2022 Nov;611(7936):519-531. doi: 10.1038/s41586-022-05325-5. Epub 2022 Oct 19.
6
High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.对扩展的 1000 基因组项目队列进行高覆盖率全基因组测序,包括 602 个三核苷酸重复序列。
Cell. 2022 Sep 1;185(18):3426-3440.e19. doi: 10.1016/j.cell.2022.08.004.
7
Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders.人类中的复发性倒位多态性与遗传不稳定性和基因组疾病相关。
Cell. 2022 May 26;185(11):1986-2005.e26. doi: 10.1016/j.cell.2022.04.017. Epub 2022 May 6.
8
The Human Pangenome Project: a global resource to map genomic diversity.人类泛基因组计划:绘制基因组多样性图谱的全球资源。
Nature. 2022 Apr;604(7906):437-446. doi: 10.1038/s41586-022-04601-8. Epub 2022 Apr 20.
9
Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes.基于泛基因组的基因组推断可在广泛的变异类别中实现高效、准确的基因分型。
Nat Genet. 2022 Apr;54(4):518-525. doi: 10.1038/s41588-022-01043-w. Epub 2022 Apr 11.
10
The complete sequence of a human genome.人类基因组的完整序列。
Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.