Heap：一种用于低覆盖度高通量测序数据的高灵敏度和高精度单核苷酸多态性检测工具。

Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data.

作者信息

Kobayashi Masaaki, Ohyanagi Hajime, Takanashi Hideki, Asano Satomi, Kudo Toru, Kajiya-Kanegae Hiromi, Nagano Atsushi J, Tainaka Hitoshi, Tokunaga Tsuyoshi, Sazuka Takashi, Iwata Hiroyoshi, Tsutsumi Nobuhiro, Yano Kentaro

机构信息

Bioinformatics Laboratory, Department of Life Sciences, School of Agriculture, Meiji University, Kanagawa 214-8571, Japan.

King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal 23955-6900, Kingdom of Saudi Arabia.

出版信息

DNA Res. 2017 Aug 1;24(4):397-405. doi: 10.1093/dnares/dsx012.

DOI:10.1093/dnares/dsx012

PMID:28498906

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5737671/

Abstract

Recent availability of large-scale genomic resources enables us to conduct so called genome-wide association studies (GWAS) and genomic prediction (GP) studies, particularly with next-generation sequencing (NGS) data. The effectiveness of GWAS and GP depends on not only their mathematical models, but the quality and quantity of variants employed in the analysis. In NGS single nucleotide polymorphism (SNP) calling, conventional tools ideally require more reads for higher SNP sensitivity and accuracy. In this study, we aimed to develop a tool, Heap, that enables robustly sensitive and accurate calling of SNPs, particularly with a low coverage NGS data, which must be aligned to the reference genome sequences in advance. To reduce false positive SNPs, Heap determines genotypes and calls SNPs at each site except for sites at the both ends of reads or containing a minor allele supported by only one read. Performance comparison with existing tools showed that Heap achieved the highest F-scores with low coverage (7X) restriction-site associated DNA sequencing reads of sorghum and rice individuals. This will facilitate cost-effective GWAS and GP studies in this NGS era. Code and documentation of Heap are freely available from https://github.com/meiji-bioinf/heap (29 March 2017, date last accessed) and our web site (http://bioinf.mind.meiji.ac.jp/lab/en/tools.html (29 March 2017, date last accessed)).

摘要

近期大规模基因组资源的可得性使我们能够开展所谓的全基因组关联研究（GWAS）和基因组预测（GP）研究，特别是利用下一代测序（NGS）数据。GWAS和GP的有效性不仅取决于其数学模型，还取决于分析中所使用变异的质量和数量。在NGS单核苷酸多态性（SNP）分型中，传统工具理论上需要更多的 reads 以获得更高的SNP敏感性和准确性。在本研究中，我们旨在开发一种工具Heap，它能够稳健地灵敏且准确地分型SNP，特别是对于低覆盖度的NGS数据，这些数据必须预先与参考基因组序列比对。为了减少假阳性SNP，Heap会确定基因型并在每个位点分型SNP，但reads两端的位点或仅由一条reads支持的含有次要等位基因的位点除外。与现有工具的性能比较表明，Heap在高粱和水稻个体的低覆盖度（7X）限制性位点相关DNA测序reads中获得了最高的F值。这将有助于在这个NGS时代开展具有成本效益的GWAS和GP研究。Heap的代码和文档可从https://github.com/meiji-bioinf/heap（最后访问日期：2017年3月29日）和我们的网站（http://bioinf.mind.meiji.ac.jp/lab/en/tools.html（最后访问日期：2017年3月29日））免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f03/5737671/df7e14bda494/dsx012f1.jpg

相似文献

Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data.Heap：一种用于低覆盖度高通量测序数据的高灵敏度和高精度单核苷酸多态性检测工具。

DNA Res. 2017 Aug 1;24(4):397-405. doi: 10.1093/dnares/dsx012.

A Fast and Scalable Workflow for SNPs Detection in Genome Sequences Using Hadoop Map-Reduce.基于 Hadoop Map-Reduce 的基因组序列中 SNPs 检测的快速可扩展工作流。

Genes (Basel). 2020 Feb 5;11(2):166. doi: 10.3390/genes11020166.

ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using next generation sequence.NGS 骨干：一种使用下一代测序进行读段清理、比对和 SNP 调用的流水线。

BMC Genomics. 2011 Jun 2;12:285. doi: 10.1186/1471-2164-12-285.

Review of alignment and SNP calling algorithms for next-generation sequencing data.下一代测序数据的比对和单核苷酸多态性（SNP）检测算法综述。

J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9.

An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome.利用来自小型真核生物基因组的模拟读数对单核苷酸多态性假阳性原因的调查。

BMC Bioinformatics. 2015 Nov 11;16:382. doi: 10.1186/s12859-015-0801-z.

Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence.基于注释的全基因组 SNP 发现利用下一代测序技术在没有参考基因组序列的情况下在大型复杂的粗山羊草基因组中

BMC Genomics. 2011 Jan 25;12:59. doi: 10.1186/1471-2164-12-59.

SNP calling by sequencing pooled samples.基于测序的混合样本 SNP 检测。

BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239.

PGen: large-scale genomic variations analysis workflow and browser in SoyKB.PGen：大豆知识库中的大规模基因组变异分析工作流程与浏览器

BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):337. doi: 10.1186/s12859-016-1227-y.

Whole-Genome Sequence Accuracy Is Improved by Replication in a Population of Mutagenized Sorghum.通过在诱变高粱群体中进行复制提高全基因组序列准确性。

G3 (Bethesda). 2018 Mar 2;8(3):1079-1094. doi: 10.1534/g3.117.300301.

Genome-Wide SNP Calling from Genotyping by Sequencing (GBS) Data: A Comparison of Seven Pipelines and Two Sequencing Technologies.基于测序基因分型（GBS）数据的全基因组单核苷酸多态性（SNP）检测：七种流程和两种测序技术的比较

PLoS One. 2016 Aug 22;11(8):e0161333. doi: 10.1371/journal.pone.0161333. eCollection 2016.

引用本文的文献

The Eastern Fox Squirrel () exhibits minimal patterns of phylogeography across native and introduced sites.东部狐松鼠（）在原生地和引入地的系统地理学模式差异极小。

J Mammal. 2024 Nov 15;106(2):395-405. doi: 10.1093/jmammal/gyae133. eCollection 2025 Apr.

Characterization of quantitative trait loci from DJ123 () independently affecting panicle structure traits in rice cultivar IR64.对DJ123（）独立影响水稻品种IR64穗部结构性状的数量性状位点的表征。

Mol Breed. 2024 Sep 2;44(9):57. doi: 10.1007/s11032-024-01494-5. eCollection 2024 Sep.

Novel QTL for Lateral Root Density and Length Improve Phosphorus Uptake in Rice (Oryza sativa L.).控制水稻（Oryza sativa L.）侧根密度和长度的新数量性状位点可提高磷吸收能力

Rice (N Y). 2023 Aug 24;16(1):37. doi: 10.1186/s12284-023-00654-z.

Multispectral Phenotyping and Genetic Analyses of Spring Appearance in Greening Plant, spp.绿化植物（物种名称未给出）春季外观的多光谱表型分析与遗传分析

Plant Phenomics. 2023 Jun 26;5:0063. doi: 10.34133/plantphenomics.0063. eCollection 2023.

QTL mapping for early root and shoot vigor of upland rice ( L.) under P deficient field conditions in Japan and Madagascar.在日本和马达加斯加缺磷田间条件下旱稻早期根系和地上部活力的QTL定位

Front Plant Sci. 2022 Oct 24;13:1017419. doi: 10.3389/fpls.2022.1017419. eCollection 2022.

Genomic signatures of natural selection at phenology-related genes in a widely distributed tree species Fagus sylvatica L.在分布广泛的树种欧洲山毛榉（Fagus sylvatica L.）中，与物候相关基因的自然选择的基因组特征

BMC Genomics. 2021 Jul 31;22(1):583. doi: 10.1186/s12864-021-07907-5.

Genetic dissection of QTLs associated with spikelet-related traits and grain size in sorghum.高粱穗部相关性状和粒宽的 QTL 遗传剖析。

Sci Rep. 2021 Apr 30;11(1):9398. doi: 10.1038/s41598-021-88917-x.

Impacts of dominance effects on genomic prediction of sorghum hybrid performance.显性效应对高粱杂交种性能基因组预测的影响。

Breed Sci. 2020 Dec;70(5):605-616. doi: 10.1270/jsbbs.20042. Epub 2020 Nov 17.

Dissecting the Genetic Architecture of Biofuel-Related Traits in a Sorghum Breeding Population.剖析高粱育种群中与生物燃料相关性状的遗传结构。

G3 (Bethesda). 2020 Dec 3;10(12):4565-4577. doi: 10.1534/g3.120.401582.

A Fast and Scalable Workflow for SNPs Detection in Genome Sequences Using Hadoop Map-Reduce.基于 Hadoop Map-Reduce 的基因组序列中 SNPs 检测的快速可扩展工作流。

Genes (Basel). 2020 Feb 5;11(2):166. doi: 10.3390/genes11020166.

本文引用的文献

The Vigna Genome Server, 'VigGS': A Genomic Knowledge Base of the Genus Vigna Based on High-Quality, Annotated Genome Sequence of the Azuki Bean, Vigna angularis (Willd.) Ohwi & Ohashi.豇豆基因组服务器“VigGS”：基于高质量注释小豆（Vigna angularis (Willd.) Ohwi & Ohashi）基因组序列的豇豆属基因组知识库。

Plant Cell Physiol. 2016 Jan;57(1):e2. doi: 10.1093/pcp/pcv189. Epub 2015 Dec 7.

OryzaGenome: Genome Diversity Database of Wild Oryza Species.水稻基因组：野生稻物种的基因组多样性数据库。

Plant Cell Physiol. 2016 Jan;57(1):e1. doi: 10.1093/pcp/pcv171. Epub 2015 Nov 16.

SNP discovery and genotyping using restriction-site-associated DNA sequencing in chickens.利用限制性位点相关DNA测序技术在鸡中进行单核苷酸多态性发现与基因分型

Anim Genet. 2015 Apr;46(2):216-9. doi: 10.1111/age.12250. Epub 2015 Jan 15.

Plant Omics Data Center: an integrated web repository for interspecies gene expression networks with NLP-based curation.植物组学数据中心：一个基于自然语言处理编目的种间基因表达网络综合网络知识库。

Plant Cell Physiol. 2015 Jan;56(1):e9. doi: 10.1093/pcp/pcu188. Epub 2014 Dec 11.

MTGD: The Medicago truncatula genome database.MTGD：蒺藜苜蓿基因组数据库。

Plant Cell Physiol. 2015 Jan;56(1):e1. doi: 10.1093/pcp/pcu179. Epub 2014 Nov 28.

From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.从FastQ数据到高可信度变异检测：基因组分析工具包最佳实践流程

Curr Protoc Bioinformatics. 2013;43(1110):11.10.1-11.10.33. doi: 10.1002/0471250953.bi1110s43.

The Draft Genome of Hop (Humulus lupulus), an Essence for Brewing.啤酒花（Humulus lupulus）的基因组草图，一种酿造精华。

Plant Cell Physiol. 2015 Mar;56(3):428-41. doi: 10.1093/pcp/pcu169. Epub 2014 Nov 20.

Identification of loci governing eight agronomic traits using a GBS-GWAS approach and validation by QTL mapping in soya bean.利用 GBS-GWAS 方法鉴定大豆 8 个农艺性状的基因座，并通过 QTL 作图进行验证。

Plant Biotechnol J. 2015 Feb;13(2):211-21. doi: 10.1111/pbi.12249. Epub 2014 Sep 12.

Genotyping by sequencing for genomic prediction in a soybean breeding population.大豆育种群体中用于基因组预测的测序基因分型

BMC Genomics. 2014 Aug 29;15(1):740. doi: 10.1186/1471-2164-15-740.

Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals.评估全基因组测序个体中单核苷酸变异检测和基因型调用。

Bioinformatics. 2014 Jun 15;30(12):1707-13. doi: 10.1093/bioinformatics/btu067. Epub 2014 Feb 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Heap：一种用于低覆盖度高通量测序数据的高灵敏度和高精度单核苷酸多态性检测工具。

Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献