Suppr超能文献

对 1000 基因组计划样本进行高覆盖度的纳米孔测序,构建人类遗传变异综合目录。

High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation.

机构信息

Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA.

Molecular and Cellular Biology Program, University of Washington, Seattle, Washington 98195, USA.

出版信息

Genome Res. 2024 Nov 20;34(11):2061-2073. doi: 10.1101/gr.279273.124.

Abstract

Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control data sets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project (1KGP) Oxford Nanopore Technologies Sequencing Consortium aims to generate LRS data from at least 800 of the 1KGP samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37× and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

摘要

在进行全面的临床遗传学检测后,仅有不到一半的疑似孟德尔或单基因疾病患者获得了精确的分子诊断。数据质量和成本的提高提高了人们对使用长读测序(LRS)简化临床基因组检测的兴趣,但缺乏用于变体过滤和优先级排序的对照数据集,使得对 LRS 数据进行三级分析具有挑战性。为了解决这个问题,1000 基因组计划(1KGP)牛津纳米孔技术测序联盟的目标是从至少 800 个 1KGP 样本中生成 LRS 数据。我们的目标是使用 LRS 来识别更广泛的变异谱,以便我们可以更好地了解人类变异的正常模式。在这里,我们展示了对前 100 个样本的分析数据,这些样本代表了所有 5 个超级群体和 19 个亚群体。这些样本的平均测序深度为 37×,序列读取 N50 为 54 kbp,与之前在识别非同源多聚区域中单核苷酸和插入缺失变体的研究具有高度一致性。使用多种结构变体(SV)调用器,我们平均每个基因组鉴定到 24,543 个高可信度的 SV,包括可能破坏基因功能的共享和私有 SV,以及在使用短读长时未检测到的与疾病相关重复内的致病性扩展。对甲基化特征的评估揭示了已知印迹基因座、具有偏性 X 失活模式的样本以及新的差异甲基化区域的预期模式。所有原始测序数据、处理后的数据和汇总统计数据均公开可用,为临床遗传学社区提供了一个有价值的资源,用于发现致病性 SV。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验