SAMStat：监测下一代测序数据中的偏倚。

SAMStat: monitoring biases in next generation sequencing data.

机构信息

Omics Science Center, Riken Yokohama Institute, Tsurumi-ku, Yokohama, Japan.

出版信息

Bioinformatics. 2011 Jan 1;27(1):130-1. doi: 10.1093/bioinformatics/btq614. Epub 2010 Nov 18.

DOI:10.1093/bioinformatics/btq614

PMID:21088025

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3008642/

Abstract

MOTIVATION

The sequence alignment/map format (SAM) is a commonly used format to store the alignments between millions of short reads and a reference genome. Often certain positions within the reads are inherently more likely to contain errors due to the protocols used to prepare the samples. Such biases can have adverse effects on both mapping rate and accuracy. To understand the relationship between potential protocol biases and poor mapping we wrote SAMstat, a simple C program plotting nucleotide overrepresentation and other statistics in mapped and unmapped reads in a concise html page. Collecting such statistics also makes it easy to highlight problems in the data processing and enables non-experts to track data quality over time.

RESULTS

We demonstrate that studying sequence features in mapped data can be used to identify biases particular to one sequencing protocol. Once identified, such biases can be considered in the downstream analysis or even be removed by read trimming or filtering techniques.

AVAILABILITY

SAMStat is open source and freely available as a C program running on all Unix-compatible platforms. The source code is available from http://samstat.sourceforge.net.

CONTACT

timolassmann@gmail.com.

摘要

动机

序列比对/映射格式（SAM）是一种常用的格式，用于存储数百万个短读段与参考基因组之间的比对。由于用于准备样本的协议，读段内的某些位置通常更容易出现固有错误。这种偏差会对映射率和准确性产生不利影响。为了了解潜在协议偏差与映射不良之间的关系，我们编写了 SAMstat，这是一个简单的 C 程序，用于在映射和未映射的读段中以简洁的 HTML 页面绘制核苷酸过表达和其他统计信息。收集这些统计信息还可以方便地突出数据处理中的问题，并使非专家能够随时间跟踪数据质量。

结果

我们证明，研究映射数据中的序列特征可用于识别特定于一种测序协议的偏差。一旦确定，这些偏差可以在下游分析中考虑，甚至可以通过读段修剪或过滤技术来去除。

可用性

SAMStat 是开源的，可作为在所有与 Unix 兼容的平台上运行的 C 程序免费获得。源代码可从 http://samstat.sourceforge.net 获得。

联系方式

timolassmann@gmail.com。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd2c/3008642/03cc4a771c17/btq614f1.jpg

相似文献

SAMStat: monitoring biases in next generation sequencing data.

Bioinformatics. 2011 Jan 1;27(1):130-1. doi: 10.1093/bioinformatics/btq614. Epub 2010 Nov 18.

SAMStat 2: quality control for next generation sequencing data.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad019.

Re-alignment of the unmapped reads with base quality score.

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S8. doi: 10.1186/1471-2105-16-S5-S8. Epub 2015 Mar 18.

Qualimap: evaluating next-generation sequencing alignment data.

Bioinformatics. 2012 Oct 15;28(20):2678-9. doi: 10.1093/bioinformatics/bts503. Epub 2012 Aug 22.

Accurate estimation of short read mapping quality for next-generation genome sequencing.

Bioinformatics. 2012 Sep 15;28(18):i349-i355. doi: 10.1093/bioinformatics/bts408.

The Sequence Alignment/Map format and SAMtools.

Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.

BamView: viewing mapped read alignment data in the context of the reference sequence.

Bioinformatics. 2010 Mar 1;26(5):676-7. doi: 10.1093/bioinformatics/btq010. Epub 2010 Jan 12.

GenomeView: a next-generation genome browser.

Nucleic Acids Res. 2012 Jan;40(2):e12. doi: 10.1093/nar/gkr995. Epub 2011 Nov 18.

SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies.

BMC Genomics. 2019 Apr 18;19(Suppl 9):238. doi: 10.1186/s12864-019-5445-3.

BamView: visualizing and interpretation of next-generation sequencing read alignments.

Brief Bioinform. 2013 Mar;14(2):203-12. doi: 10.1093/bib/bbr073. Epub 2012 Jan 16.

引用本文的文献

Comparative transcriptomics of salinomycin molecular toxicity in chicken and turkey.

Sci Rep. 2025 Jul 1;15(1):21586. doi: 10.1038/s41598-025-08812-7.

Partial amelioration of a chronic cigarette-smoke-induced phenotype in mice by switching to electronic cigarettes.

Arch Toxicol. 2025 Apr 18. doi: 10.1007/s00204-025-04055-7.

A single rare σ70 variant establishes a unique gene expression pattern in the E. coli pathobiont LF82.

Nucleic Acids Res. 2024 Oct 28;52(19):11552-11570. doi: 10.1093/nar/gkae773.

Transcriptomic profiling reveals histone acetylation-regulated genes involved in somatic embryogenesis in Arabidopsis thaliana.

BMC Genomics. 2024 Aug 15;25(1):788. doi: 10.1186/s12864-024-10623-5.

Sex-specific DNA-replication in the early mammalian embryo.

Nat Commun. 2024 Jul 27;15(1):6323. doi: 10.1038/s41467-024-50727-w.

Comparative Genomic Analysis of Bacterial Data in BV-BRC: An Example Exploring Antimicrobial Resistance.

Methods Mol Biol. 2024;2802:547-571. doi: 10.1007/978-1-0716-3838-5_18.

Genomic analysis of fruit size and shape traits in apple: unveiling candidate genes through GWAS analysis.

Hortic Res. 2023 Dec 19;11(2):uhad270. doi: 10.1093/hr/uhad270. eCollection 2024 Feb.

Gene editing and cardiac disease modelling for the interpretation of genetic variants of uncertain significance in congenital heart disease.

Stem Cell Res Ther. 2023 Dec 5;14(1):345. doi: 10.1186/s13287-023-03592-1.

SAMStat 2: quality control for next generation sequencing data.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad019.

Loss of SNAI1 induces cellular plasticity in invasive triple-negative breast cancer cells.

Cell Death Dis. 2022 Sep 28;13(9):832. doi: 10.1038/s41419-022-05280-z.

本文引用的文献

Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan.

Nat Methods. 2010 Jul;7(7):528-34. doi: 10.1038/nmeth.1470. Epub 2010 Jun 13.

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.

Nucleic Acids Res. 2010 Apr;38(6):1767-71. doi: 10.1093/nar/gkp1137. Epub 2009 Dec 16.

The Sequence Alignment/Map format and SAMtools.

Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.

Fast and accurate short read alignment with Burrows-Wheeler transform.

Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.

How to map billions of short reads onto genomes.

Nat Biotechnol. 2009 May;27(5):455-7. doi: 10.1038/nbt0509-455.

Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Genome Res. 2008 Nov;18(11):1851-8. doi: 10.1101/gr.078212.108. Epub 2008 Aug 19.

A code for transcription initiation in mammalian genomes.

Genome Res. 2008 Jan;18(1):1-12. doi: 10.1101/gr.6831208. Epub 2007 Nov 21.

Genome-wide analysis of mammalian promoter architecture and evolution.

Nat Genet. 2006 Jun;38(6):626-35. doi: 10.1038/ng1789. Epub 2006 Apr 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SAMStat：监测下一代测序数据中的偏倚。

SAMStat: monitoring biases in next generation sequencing data.

机构信息

Omics Science Center, Riken Yokohama Institute, Tsurumi-ku, Yokohama, Japan.

出版信息

Bioinformatics. 2011 Jan 1;27(1):130-1. doi: 10.1093/bioinformatics/btq614. Epub 2010 Nov 18.

DOI:10.1093/bioinformatics/btq614

PMID:21088025

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3008642/

Abstract

MOTIVATION

RESULTS

AVAILABILITY

SAMStat is open source and freely available as a C program running on all Unix-compatible platforms. The source code is available from http://samstat.sourceforge.net.

CONTACT

timolassmann@gmail.com.

摘要

动机

结果

可用性

SAMStat 是开源的，可作为在所有与 Unix 兼容的平台上运行的 C 程序免费获得。源代码可从 http://samstat.sourceforge.net 获得。

联系方式

timolassmann@gmail.com。

SAMStat：监测下一代测序数据中的偏倚。

SAMStat: monitoring biases in next generation sequencing data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

动机

结果

可用性

联系方式

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

SAMStat：监测下一代测序数据中的偏倚。

SAMStat: monitoring biases in next generation sequencing data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

动机

结果

可用性

联系方式

相似文献

引用本文的文献

本文引用的文献