Suppr超能文献

GIAB 基因组分层资源用于人类参考基因组。

The GIAB genomic stratifications resource for human reference genomes.

机构信息

Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD., USA.

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.

出版信息

Nat Commun. 2024 Oct 19;15(1):9029. doi: 10.1038/s41467-024-53260-y.

Abstract

Despite the growing variety of sequencing and variant-calling tools, no workflow performs equally well across the entire human genome. Understanding context-dependent performance is critical for enabling researchers, clinicians, and developers to make informed tradeoffs when selecting sequencing hardware and software. Here we describe a set of "stratifications," which are BED files that define distinct contexts throughout the genome. We define these for GRCh37/38 as well as the new T2T-CHM13 reference, adding many new hard-to-sequence regions which are critical for understanding performance as the field progresses. Specifically, we highlight the increase in hard-to-map and GC-rich stratifications in CHM13 relative to the previous references. We then compare the benchmarking performance with each reference and show the performance penalty brought about by these additional difficult regions in CHM13. Additionally, we demonstrate how the stratifications can track context-specific improvements over different platform iterations, using Oxford Nanopore Technologies as an example. The means to generate these stratifications are available as a snakemake pipeline at https://github.com/usnistgov/giab-stratifications . We anticipate this being useful in enabling precise risk-reward calculations when building sequencing pipelines for any of the commonly-used reference genomes.

摘要

尽管测序和变异调用工具的种类不断增加,但没有一种工作流程能够在整个人类基因组中表现得同样出色。了解上下文相关的性能对于使研究人员、临床医生和开发人员能够在选择测序硬件和软件时做出明智的权衡至关重要。在这里,我们描述了一组“分层”,它们是定义基因组中不同上下文的 BED 文件。我们为 GRCh37/38 以及新的 T2T-CHM13 参考定义了这些分层,并添加了许多新的难以测序的区域,这些区域对于随着该领域的发展理解性能至关重要。具体来说,我们强调了 CHM13 中难以映射和富含 GC 的分层相对于以前的参考增加了。然后,我们比较了每个参考的基准测试性能,并展示了 CHM13 中这些额外困难区域带来的性能损失。此外,我们还展示了如何使用 Oxford Nanopore Technologies 等示例,根据分层情况跟踪不同平台迭代的特定上下文改进。生成这些分层的方法可在 https://github.com/usnistgov/giab-stratifications 上作为 snakemake 管道获得。我们预计,当为任何常用参考基因组构建测序管道时,这将有助于进行精确的风险回报计算。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca4a/11489684/289f02d425df/41467_2024_53260_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验