Suppr超能文献

对单个非白种人全基因组进行变异检测和基因分型的比较研究。

A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome.

作者信息

Park HyeonSeul, Gim JungSoo

机构信息

Chosun University.

出版信息

Res Sq. 2023 Mar 6:rs.3.rs-2580940. doi: 10.21203/rs.3.rs-2580940/v1.

Abstract

Most genome benchmark studies utilize hg38 as a reference genome (based on Caucasian and African samples) and 'NA12878' (a Caucasian sequencing read) for comparison. Here, we aimed to elucidate whether 1) ethnic match or mismatch between the reference genome and sequencing reads produces a distinct result; 2) there is an optimal work flow for single genome data. We assessed the performance of variant calling pipelines using hg38 and a Korean genome (reference genomes) and two whole-genome sequencing (WGS) reads from different ethnic origins: Caucasian (NA12878) and Korean. The pipelines used BWA-mem and Novoalign as mapping tools and GATK4, Strelka2, DeepVariant, and Samtools as variant callers. Using hg38 led to better performance (based on precision and recall), regardless of the ethnic origin of the WGS reads. Novoalign + GATK4 demonstrated best performance when using both WGS data. We assessed pipeline efficiency by removing the markduplicate process, and all pipelines, except Novoalign + DeepVariant, maintained their performance. Novoalign identified more variants overall and in of chr6 when combined with GATK4. No evidence suggested improved variant calling performance from single WGS reads with a different ethnic reference, re-validating hg38 utility. We recommend using Novoalign + GATK4 without markduplication for single PCR-free WGS data.

摘要

大多数基因组基准研究使用hg38作为参考基因组(基于白种人和非洲样本),并使用“NA12878”(一个白种人的测序读数)进行比较。在此,我们旨在阐明:1)参考基因组与测序读数之间的种族匹配或不匹配是否会产生不同的结果;2)单基因组数据是否存在最佳工作流程。我们使用hg38和一个韩国基因组(参考基因组)以及来自不同种族的两个全基因组测序(WGS)读数:白种人(NA12878)和韩国人,评估了变异检测流程的性能。这些流程使用BWA-mem和Novoalign作为映射工具,使用GATK4、Strelka2、DeepVariant和Samtools作为变异检测工具。无论WGS读数的种族来源如何,使用hg38都能带来更好的性能(基于精确率和召回率)。当使用两种WGS数据时,Novoalign + GATK4表现出最佳性能。我们通过去除标记重复过程来评估流程效率,除了Novoalign + DeepVariant之外,所有流程都保持了它们的性能。当与GATK4结合使用时,Novoalign在总体上以及在chr6的特定区域识别出更多变异。没有证据表明使用不同种族参考的单WGS读数能提高变异检测性能,这再次验证了hg38的实用性。我们建议对于无PCR的单WGS数据,使用不进行标记重复的Novoalign + GATK4。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ec1/10029055/fc0405e7a29d/nihpp-rs2580940v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验