Suppr超能文献

Heap:一种用于低覆盖度高通量测序数据的高灵敏度和高精度单核苷酸多态性检测工具。

Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data.

作者信息

Kobayashi Masaaki, Ohyanagi Hajime, Takanashi Hideki, Asano Satomi, Kudo Toru, Kajiya-Kanegae Hiromi, Nagano Atsushi J, Tainaka Hitoshi, Tokunaga Tsuyoshi, Sazuka Takashi, Iwata Hiroyoshi, Tsutsumi Nobuhiro, Yano Kentaro

机构信息

Bioinformatics Laboratory, Department of Life Sciences, School of Agriculture, Meiji University, Kanagawa 214-8571, Japan.

King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal 23955-6900, Kingdom of Saudi Arabia.

出版信息

DNA Res. 2017 Aug 1;24(4):397-405. doi: 10.1093/dnares/dsx012.

Abstract

Recent availability of large-scale genomic resources enables us to conduct so called genome-wide association studies (GWAS) and genomic prediction (GP) studies, particularly with next-generation sequencing (NGS) data. The effectiveness of GWAS and GP depends on not only their mathematical models, but the quality and quantity of variants employed in the analysis. In NGS single nucleotide polymorphism (SNP) calling, conventional tools ideally require more reads for higher SNP sensitivity and accuracy. In this study, we aimed to develop a tool, Heap, that enables robustly sensitive and accurate calling of SNPs, particularly with a low coverage NGS data, which must be aligned to the reference genome sequences in advance. To reduce false positive SNPs, Heap determines genotypes and calls SNPs at each site except for sites at the both ends of reads or containing a minor allele supported by only one read. Performance comparison with existing tools showed that Heap achieved the highest F-scores with low coverage (7X) restriction-site associated DNA sequencing reads of sorghum and rice individuals. This will facilitate cost-effective GWAS and GP studies in this NGS era. Code and documentation of Heap are freely available from https://github.com/meiji-bioinf/heap (29 March 2017, date last accessed) and our web site (http://bioinf.mind.meiji.ac.jp/lab/en/tools.html (29 March 2017, date last accessed)).

摘要

近期大规模基因组资源的可得性使我们能够开展所谓的全基因组关联研究(GWAS)和基因组预测(GP)研究,特别是利用下一代测序(NGS)数据。GWAS和GP的有效性不仅取决于其数学模型,还取决于分析中所使用变异的质量和数量。在NGS单核苷酸多态性(SNP)分型中,传统工具理论上需要更多的 reads 以获得更高的SNP敏感性和准确性。在本研究中,我们旨在开发一种工具Heap,它能够稳健地灵敏且准确地分型SNP,特别是对于低覆盖度的NGS数据,这些数据必须预先与参考基因组序列比对。为了减少假阳性SNP,Heap会确定基因型并在每个位点分型SNP,但reads两端的位点或仅由一条reads支持的含有次要等位基因的位点除外。与现有工具的性能比较表明,Heap在高粱和水稻个体的低覆盖度(7X)限制性位点相关DNA测序reads中获得了最高的F值。这将有助于在这个NGS时代开展具有成本效益的GWAS和GP研究。Heap的代码和文档可从https://github.com/meiji-bioinf/heap(最后访问日期:2017年3月29日)和我们的网站(http://bioinf.mind.meiji.ac.jp/lab/en/tools.html(最后访问日期:2017年3月29日))免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f03/5737671/df7e14bda494/dsx012f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验