• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Illumina HiSeq 和基因组分析仪系统生成的基因组高通量测序数据评估。

Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems.

机构信息

Centre for Genomic Regulation (CRG), Barcelona, Spain.

出版信息

Genome Biol. 2011 Nov 8;12(11):R112. doi: 10.1186/gb-2011-12-11-r112.

DOI:10.1186/gb-2011-12-11-r112
PMID:22067484
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3334598/
Abstract

BACKGROUND

The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases.

RESULTS

We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range.

CONCLUSIONS

The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms.

摘要

背景

高通量测序数据的产生和分析正成为分子生物学和医学研究中许多研究的主要组成部分。Illumina 的 Genome Analyzer(GA)和 HiSeq 仪器是目前使用最广泛的测序设备。在这里,我们全面评估了两个植物基因组和一个病毒的基因组 HiSeq 和 GAIIx 数据的特性,读取长度为 95 到 150 个碱基。

结果

我们提供了 GC 偏倚、错误率、错误序列上下文、质量过滤的影响以及质量值的可靠性的定量和证据。通过结合不同的过滤标准,我们将错误率降低了 7 倍,但代价是丢弃了 12.5%的可对齐碱基。虽然 HiSeq 数据中的总体错误率较低,但我们观察到了累积错误碱基调用的区域。所有错误位置仅占所有取代错误的 24.7%,而占所有错误位置的 3%。分别分析正向和反向链,发现错误率高达 18.7%。平均而言,插入和缺失的发生率非常低,但在同聚体中增加到 2%。发现读取覆盖率与 GC 含量之间存在正相关关系,具体取决于 GC 含量范围。

结论

我们报告的错误和偏差对 Illumina 测序数据的使用和解释有影响。GAIIx 和 HiSeq 数据集显示出略有不同的错误分布。质量过滤对于最小化下游分析伪影至关重要。支持先前的建议,链特异性提供了区分测序错误和低丰度多态性的标准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e338/3334598/57c6279fa0ab/gb-2011-12-11-r112-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e338/3334598/ed8de017023d/gb-2011-12-11-r112-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e338/3334598/847b663886a6/gb-2011-12-11-r112-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e338/3334598/693f020be3b6/gb-2011-12-11-r112-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e338/3334598/13e97d7d6a7b/gb-2011-12-11-r112-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e338/3334598/b036b1069518/gb-2011-12-11-r112-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e338/3334598/57c6279fa0ab/gb-2011-12-11-r112-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e338/3334598/ed8de017023d/gb-2011-12-11-r112-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e338/3334598/847b663886a6/gb-2011-12-11-r112-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e338/3334598/693f020be3b6/gb-2011-12-11-r112-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e338/3334598/13e97d7d6a7b/gb-2011-12-11-r112-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e338/3334598/b036b1069518/gb-2011-12-11-r112-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e338/3334598/57c6279fa0ab/gb-2011-12-11-r112-6.jpg

相似文献

1
Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems.Illumina HiSeq 和基因组分析仪系统生成的基因组高通量测序数据评估。
Genome Biol. 2011 Nov 8;12(11):R112. doi: 10.1186/gb-2011-12-11-r112.
2
Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data.Illumina错误概况:解析宏基因组测序数据中的精细尺度变异
BMC Bioinformatics. 2016 Mar 11;17:125. doi: 10.1186/s12859-016-0976-y.
3
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.来自高通量DNA测序的超短读长数据集存在大量偏差。
Nucleic Acids Res. 2008 Sep;36(16):e105. doi: 10.1093/nar/gkn425. Epub 2008 Jul 26.
4
Characterization of sequence-specific errors in various next-generation sequencing systems.各种新一代测序系统中序列特异性错误的特征分析。
Mol Biosyst. 2016 Mar;12(3):914-22. doi: 10.1039/c5mb00750j.
5
Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.深入了解Illumina MiSeq平台进行扩增子测序时的偏差和测序错误。
Nucleic Acids Res. 2015 Mar 31;43(6):e37. doi: 10.1093/nar/gku1341. Epub 2015 Jan 13.
6
Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling.两种外显子捕获试剂盒和测序平台用于变异检测的比较与评估
BMC Genomics. 2015 Aug 5;16(1):581. doi: 10.1186/s12864-015-1796-6.
7
Empirical estimation of sequencing error rates using smoothing splines.使用平滑样条对测序错误率进行经验估计。
BMC Bioinformatics. 2016 Apr 22;17:177. doi: 10.1186/s12859-016-1052-3.
8
Discovering motifs that induce sequencing errors.发现诱导测序错误的模体。
BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S1. doi: 10.1186/1471-2105-14-S5-S1. Epub 2013 Apr 10.
9
A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.三代测序平台的故事:Ion Torrent、Pacific Biosciences 和 Illumina MiSeq 测序仪的比较。
BMC Genomics. 2012 Jul 24;13:341. doi: 10.1186/1471-2164-13-341.
10
Lane-by-lane sequencing using Illumina's Genome Analyzer II.使用 Illumina 的 Genome Analyzer II 进行逐道测序。
Biotechniques. 2013 May;54(5):265-9. doi: 10.2144/000114032.

引用本文的文献

1
Methods Established for Gene Mutation Detection in Glyphosate-Resistant Rice ( L.).抗草甘膦水稻(L.)基因突变检测方法的建立
Plants (Basel). 2025 Jul 22;14(15):2256. doi: 10.3390/plants14152256.
2
Sulfur-rich deposits associated with the deep submarine volcano Fani Maoré support broad microbial sulfur cycling communities.与深海火山法尼·马奥雷相关的富硫矿床支持广泛的微生物硫循环群落。
Microbiome. 2025 Jul 15;13(1):166. doi: 10.1186/s40168-025-02153-3.
3
The mutational landscape of SARS-CoV-2 provides new insight into viral evolution and fitness.

本文引用的文献

1
Genotype and SNP calling from next-generation sequencing data.从下一代测序数据中进行基因型和单核苷酸多态性(SNP)的调用。
Nat Rev Genet. 2011 Jun;12(6):443-51. doi: 10.1038/nrg2986.
2
Sequence-specific error profile of Illumina sequencers.Illumina 测序仪的序列特异性错误特征。
Nucleic Acids Res. 2011 Jul;39(13):e90. doi: 10.1093/nar/gkr344. Epub 2011 May 16.
3
Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries.分析并最小化 Illumina 测序文库中的 PCR 扩增偏倚。
严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的突变图谱为病毒进化和适应性提供了新的见解。
Nat Commun. 2025 Jul 11;16(1):6425. doi: 10.1038/s41467-025-61555-x.
4
Exercise-induced microbiota metabolite enhances CD8 T cell antitumor immunity promoting immunotherapy efficacy.运动诱导的微生物群代谢产物增强CD8 T细胞抗肿瘤免疫力,提高免疫治疗效果。
Cell. 2025 Jul 4. doi: 10.1016/j.cell.2025.06.018.
5
Impact of fluoroquinolone and heavy metal pollution on antibiotic resistance maintenance in aquatic ecosystems.氟喹诺酮和重金属污染对水生生态系统中抗生素抗性维持的影响。
Environ Microbiome. 2025 May 27;20(1):58. doi: 10.1186/s40793-025-00722-5.
6
Spatial ecology of the family in the human oral cavity.人类口腔中菌群的空间生态学
Microbiol Spectr. 2025 Apr 8;13(5):e0327524. doi: 10.1128/spectrum.03275-24.
7
Recovery of 679 metagenome-assembled genomes from different soil depths along a precipitation gradient.从沿降水梯度的不同土壤深度中恢复679个宏基因组组装基因组。
Sci Data. 2025 Mar 28;12(1):521. doi: 10.1038/s41597-025-04884-2.
8
Refined variant calling pipeline on RNA-seq data of breast cancer cell lines without matched-normal samples.针对无匹配正常样本的乳腺癌细胞系RNA测序数据的精细化变异检测流程
BMC Res Notes. 2025 Feb 15;18(1):67. doi: 10.1186/s13104-025-07140-3.
9
Ribosomal protein phylogeography offers quantitative insights into the efficacy of genome-resolved surveys of microbial communities.核糖体蛋白系统地理学为微生物群落基因组解析调查的功效提供了定量见解。
bioRxiv. 2025 Jan 15:2025.01.15.633187. doi: 10.1101/2025.01.15.633187.
10
Bayesian Phylogenetic Lineage Reconstruction with Loss of Heterozygosity Mutations Derived from Single-Cell RNA Sequencing.基于单细胞RNA测序中杂合性缺失突变的贝叶斯系统发育谱系重建
Methods Mol Biol. 2025;2886:1-22. doi: 10.1007/978-1-0716-4310-5_1.
Genome Biol. 2011;12(2):R18. doi: 10.1186/gb-2011-12-2-r18. Epub 2011 Feb 21.
4
Tablet--next generation sequence assembly visualization.片剂--下一代序列组装可视化。
Bioinformatics. 2010 Feb 1;26(3):401-2. doi: 10.1093/bioinformatics/btp666. Epub 2009 Dec 4.
5
Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。
Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.
6
Efficient frequency-based de novo short-read clustering for error trimming in next-generation sequencing.用于下一代测序中错误校正的基于频率的高效从头短读聚类
Genome Res. 2009 Jul;19(7):1309-15. doi: 10.1101/gr.089151.108. Epub 2009 May 13.
7
Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes.无扩增的Illumina测序文库制备有助于改进(G+C)偏向基因组的映射和组装。
Nat Methods. 2009 Apr;6(4):291-5. doi: 10.1038/nmeth.1311. Epub 2009 Mar 15.
8
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.短DNA序列与人类基因组的超快速且内存高效比对。
Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. Epub 2009 Mar 4.
9
Accurate whole human genome sequencing using reversible terminator chemistry.使用可逆终止子化学法进行准确的全人类基因组测序。
Nature. 2008 Nov 6;456(7218):53-9. doi: 10.1038/nature07517.
10
Haplotype divergence in Beta vulgaris and microsynteny with sequenced plant genomes.甜菜的单倍型差异以及与已测序植物基因组的微同源性。
Plant J. 2009 Jan;57(1):14-26. doi: 10.1111/j.1365-313X.2008.03665.x. Epub 2008 Aug 29.