文献检索，用中文搜 PubMed

Recent advances in genome sequencing have improved variant calling in complex regions of the human genome. However, it is difficult to quantify variant calling performance because existing standards often focus on specificity, neglecting completeness in difficult-to-analyze regions. To create a more comprehensive truth set, we used Mendelian inheritance in a large pedigree (CEPH-1463) to filter variants across PacBio high-fidelity (HiFi), Illumina and Oxford Nanopore Technologies platforms. This generated a variant map with over 4.7 million single-nucleotide variants, 767,795 insertions and deletions (indels), 537,486 tandem repeats and 24,315 structural variants, covering 2.77 Gb of the GRCh38 genome. This work adds ~200 Mb of high-confidence regions, including 8% more small variants, and introduces the first tandem repeat and structural variant truth sets for NA12878 and her family. As an example of the value of this improved benchmark, we retrained DeepVariant using these data to reduce genotyping errors by ~34%.

基因组测序的最新进展改进了人类基因组复杂区域的变异检测。然而，由于现有标准通常侧重于特异性，而忽略了难以分析区域的完整性，因此难以量化变异检测性能。为了创建一个更全面的真值集，我们利用一个大型家系（CEPH-1463）中的孟德尔遗传来筛选PacBio高保真（HiFi）、Illumina和牛津纳米孔技术平台上的变异。这生成了一个变异图谱，包含超过470万个单核苷酸变异、767,795个插入和缺失（indel）、537,486个串联重复以及24,315个结构变异，覆盖了GRCh38基因组的2.77Gb。这项工作增加了约200Mb的高置信度区域，包括多8%的小变异，并为NA12878及其家族引入了首个串联重复和结构变异真值集。作为这个改进基准价值的一个例子，我们使用这些数据重新训练了DeepVariant，将基因分型错误减少了约34%。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

铂金谱系：遗传变异的长读长基准

The Platinum Pedigree: a long-read benchmark for genetic variants.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献