文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome.

作者信息

Ribeiro Antonio, Golicz Agnieszka, Hackett Christine Anne, Milne Iain, Stephen Gordon, Marshall David, Flavell Andrew J, Bayer Micha

机构信息

The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK.

Division of Plant Sciences, University of Dundee at JHI, Invergowrie, Dundee, DD2 5DA, Scotland, UK.

出版信息

BMC Bioinformatics. 2015 Nov 11;16:382. doi: 10.1186/s12859-015-0801-z.


DOI:10.1186/s12859-015-0801-z
PMID:26558718
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4642669/
Abstract

BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are widely used molecular markers, and their use has increased massively since the inception of Next Generation Sequencing (NGS) technologies, which allow detection of large numbers of SNPs at low cost. However, both NGS data and their analysis are error-prone, which can lead to the generation of false positive (FP) SNPs. We explored the relationship between FP SNPs and seven factors involved in mapping-based variant calling - quality of the reference sequence, read length, choice of mapper and variant caller, mapping stringency and filtering of SNPs by read mapping quality and read depth. This resulted in 576 possible factor level combinations. We used error- and variant-free simulated reads to ensure that every SNP found was indeed a false positive. RESULTS: The variation in the number of FP SNPs generated ranged from 0 to 36,621 for the 120 million base pairs (Mbp) genome. All of the experimental factors tested had statistically significant effects on the number of FP SNPs generated and there was a considerable amount of interaction between the different factors. Using a fragmented reference sequence led to a dramatic increase in the number of FP SNPs generated, as did relaxed read mapping and a lack of SNP filtering. The choice of reference assembler, mapper and variant caller also significantly affected the outcome. The effect of read length was more complex and suggests a possible interaction between mapping specificity and the potential for contributing more false positives as read length increases. CONCLUSIONS: The choice of tools and parameters involved in variant calling can have a dramatic effect on the number of FP SNPs produced, with particularly poor combinations of software and/or parameter settings yielding tens of thousands in this experiment. Between-factor interactions make simple recommendations difficult for a SNP discovery pipeline but the quality of the reference sequence is clearly of paramount importance. Our findings are also a stark reminder that it can be unwise to use the relaxed mismatch settings provided as defaults by some read mappers when reads are being mapped to a relatively unfinished reference sequence from e.g. a non-model organism in its early stages of genomic exploration.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/8188b49001e3/12859_2015_801_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/e907762b3626/12859_2015_801_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/ff3b98aa482c/12859_2015_801_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/6f7db77c309a/12859_2015_801_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/f44ce1392b7f/12859_2015_801_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/28f8d7c9095e/12859_2015_801_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/6ea71561a919/12859_2015_801_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/7f1c3cb63e3d/12859_2015_801_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/fd886f00610e/12859_2015_801_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/6381fb2555b0/12859_2015_801_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/8188b49001e3/12859_2015_801_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/e907762b3626/12859_2015_801_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/ff3b98aa482c/12859_2015_801_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/6f7db77c309a/12859_2015_801_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/f44ce1392b7f/12859_2015_801_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/28f8d7c9095e/12859_2015_801_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/6ea71561a919/12859_2015_801_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/7f1c3cb63e3d/12859_2015_801_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/fd886f00610e/12859_2015_801_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/6381fb2555b0/12859_2015_801_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f891/4642669/8188b49001e3/12859_2015_801_Fig10_HTML.jpg

相似文献

[1]
An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome.

BMC Bioinformatics. 2015-11-11

[2]
Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence.

BMC Genomics. 2011-1-25

[3]
Specificity control for read alignments using an artificial reference genome-guided false discovery rate.

Bioinformatics. 2013-5-17

[4]
Whole-Genome Sequence Accuracy Is Improved by Replication in a Population of Mutagenized Sorghum.

G3 (Bethesda). 2018-3-2

[5]
Improving mapping and SNP-calling performance in multiplexed targeted next-generation sequencing.

BMC Genomics. 2012-8-22

[6]
Read trimming has minimal effect on bacterial SNP-calling accuracy.

Microb Genom. 2020-12

[7]
A high-throughput SNP discovery strategy for RNA-seq data.

BMC Genomics. 2019-2-27

[8]
Coverage-based consensus calling (CbCC) of short sequence reads and comparison of CbCC results to identify SNPs in chickpea (Cicer arietinum; Fabaceae), a crop species without a reference genome.

Am J Bot. 2012-2-1

[9]
Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses.

PLoS One. 2014-8-21

[10]
Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.

Genomics. 2017-7

引用本文的文献

[1]
F-Based Marker Prioritization Within Quantitative Trait Loci Regions and Its Impact on Genomic Selection Accuracy: Insights from a Simulation Study with High-Density Marker Panels for Bovines.

Genes (Basel). 2025-5-10

[2]
Whole-genome comparison using complete genomes from strains revealed single nucleotide polymorphisms on non-genomic islands for subspecies differentiation.

Front Microbiol. 2024-9-12

[3]
Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics.

Mol Biol Evol. 2023-7-5

[4]
SNP4OrphanSpecies: A bioinformatics pipeline to isolate molecular markers for studying genetic diversity of orphan species.

Biodivers Data J. 2022-8-24

[5]
A unique Toxoplasma gondii haplotype accompanied the global expansion of cats.

Nat Commun. 2022-10-1

[6]
The evolutionary patterns of barley pericentromeric chromosome regions, as shaped by linkage disequilibrium and domestication.

Plant J. 2022-9

[7]
Sequences to Differences in Gene Expression: Analysis of RNA-Seq Data.

Methods Mol Biol. 2022

[8]
Generalizable characteristics of false-positive bacterial variant calls.

Microb Genom. 2021-8

[9]
DEEPGEN-A Novel Variant Calling Assay for Low Frequency Variants.

Genes (Basel). 2021-3-30

[10]
Comparative Transcriptomics and RNA-Seq-Based Bulked Segregant Analysis Reveals Genomic Basis Underlying Virulence.

Front Microbiol. 2021-2-22

本文引用的文献

[1]
A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome.

Science. 2014-7-18

[2]
Toward better understanding of artifacts in variant calling from high-coverage samples.

Bioinformatics. 2014-6-27

[3]
The impacts of read length and transcriptome complexity for de novo assembly: a simulation study.

PLoS One. 2014-4-15

[4]
Lacking alignments? The next-generation sequencing mapper segemehl revisited.

Bioinformatics. 2014-7-1

[5]
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.

Gigascience. 2013-7-22

[6]
The Norway spruce genome sequence and conifer genome evolution.

Nature. 2013-5-22

[7]
Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects.

Sci Rep. 2013

[8]
QUAST: quality assessment tool for genome assemblies.

Bioinformatics. 2013-2-19

[9]
SNP Discovery through Next-Generation Sequencing and Its Applications.

Int J Plant Genomics. 2012

[10]
A physical, genetic and functional sequence assembly of the barley genome.

Nature. 2012-10-17

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索