Suppr超能文献

利用来自单个个体的长读长测序技术,为变异检测方法的基准测试提供全面资源。

Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods.

作者信息

Mu John C, Tootoonchi Afshar Pegah, Mohiyuddin Marghoob, Chen Xi, Li Jian, Bani Asadi Narges, Gerstein Mark B, Wong Wing H, Lam Hugo Y K

机构信息

Bina Technologies, Roche Sequencing, Redwood City, CA 94065, USA.

Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

出版信息

Sci Rep. 2015 Sep 28;5:14493. doi: 10.1038/srep14493.

Abstract

A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools.

摘要

一个高可信度、全面的人类变异数据集对于评估测序算法的准确性至关重要,而测序算法在基于高通量测序的精准医学中起着关键作用。尽管近期的研究试图提供这样一种资源,但它们仍然没有涵盖包括结构变异(SVs)在内的所有主要变异类型。因此,我们利用来自HuRef基因组的大量高质量桑格测序数据,构建了迄今为止单个个体最全面的金标准数据集,并通过深度Illumina测序、群体数据集和成熟算法进行了交叉验证。由于HuRef基因组先前公布的变异大多是在五年前报道的,存在兼容性、组织性和准确性问题,无法直接用于基准测试,因此完全重新分析HuRef基因组是必要的。我们广泛的分析和验证产生了一个具有高特异性和敏感性的金标准数据集。与目前的NA12878或HS1011基因组金标准数据集不同,我们的金标准数据集是第一个包含小变异、长达十万碱基对的缺失SVs和插入SVs的数据集。我们展示了我们的HuRef金标准数据集在对几种已发表的SV检测工具进行基准测试方面的实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d03/4585973/7ccc2a812ae4/srep14493-f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验