• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Illumina错误概况:解析宏基因组测序数据中的精细尺度变异

Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data.

作者信息

Schirmer Melanie, D'Amore Rosalinda, Ijaz Umer Z, Hall Neil, Quince Christopher

机构信息

The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA.

Harvard T.H. Chan School of Public Health, 655 Huntington Ave, Boston, MA 02115, USA.

出版信息

BMC Bioinformatics. 2016 Mar 11;17:125. doi: 10.1186/s12859-016-0976-y.

DOI:10.1186/s12859-016-0976-y
PMID:26968756
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4787001/
Abstract

BACKGROUND

Illumina's sequencing platforms are currently the most utilised sequencing systems worldwide. The technology has rapidly evolved over recent years and provides high throughput at low costs with increasing read-lengths and true paired-end reads. However, data from any sequencing technology contains noise and our understanding of the peculiarities and sequencing errors encountered in Illumina data has lagged behind this rapid development.

RESULTS

We conducted a systematic investigation of errors and biases in Illumina data based on the largest collection of in vitro metagenomic data sets to date. We evaluated the Genome Analyzer II, HiSeq and MiSeq and tested state-of-the-art low input library preparation methods. Analysing in vitro metagenomic sequencing data allowed us to determine biases directly associated with the actual sequencing process. The position- and nucleotide-specific analysis revealed a substantial bias related to motifs (3mers preceding errors) ending in "GG". On average the top three motifs were linked to 16 % of all substitution errors. Furthermore, a preferential incorporation of ddGTPs was recorded. We hypothesise that all of these biases are related to the engineered polymerase and ddNTPs which are intrinsic to any sequencing-by-synthesis method. We show that quality-score-based error removal strategies can on average remove 69 % of the substitution errors - however, the motif-bias remains.

CONCLUSION

Single-nucleotide polymorphism changes in bacterial genomes can cause significant changes in phenotype, including antibiotic resistance and virulence, detecting them within metagenomes is therefore vital. Current error removal techniques are not designed to target the peculiarities encountered in Illumina sequencing data and other sequencing-by-synthesis methods, causing biases to persist and potentially affect any conclusions drawn from the data. In order to develop effective diagnostic and therapeutic approaches we need to be able to identify systematic sequencing errors and distinguish these errors from true genetic variation.

摘要

背景

Illumina的测序平台是目前全球使用最广泛的测序系统。近年来,该技术迅速发展,能够以低成本实现高通量测序,同时读长不断增加,真正的双端测序也得以实现。然而,任何测序技术产生的数据都包含噪声,我们对Illumina数据中出现的特性和测序错误的理解,却落后于这一快速发展的技术。

结果

我们基于迄今为止最大规模的体外宏基因组数据集,对Illumina数据中的错误和偏差进行了系统研究。我们评估了Genome Analyzer II、HiSeq和MiSeq,并测试了最先进的低输入文库制备方法。分析体外宏基因组测序数据使我们能够直接确定与实际测序过程相关的偏差。位置和核苷酸特异性分析揭示了与以“GG”结尾的基序(错误前的三联体)相关的显著偏差。平均而言,排名前三的基序与所有替换错误的16%相关。此外,还记录到ddGTP的优先掺入。我们推测所有这些偏差都与工程化聚合酶和ddNTPs有关,而它们是任何合成测序方法所固有的。我们表明,基于质量评分的错误去除策略平均可以去除69%的替换错误——然而,基序偏差仍然存在。

结论

细菌基因组中的单核苷酸多态性变化可导致表型的显著变化,包括抗生素抗性和毒力,因此在宏基因组中检测这些变化至关重要。当前的错误去除技术并非针对Illumina测序数据及其他合成测序方法中出现的特性而设计,导致偏差持续存在,并可能影响从数据中得出的任何结论。为了开发有效的诊断和治疗方法我们需要能够识别系统的测序错误,并将这些错误与真正的基因变异区分开来。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/6b1073a8339d/12859_2016_976_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/8b968a9f3252/12859_2016_976_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/b124d42ecaf5/12859_2016_976_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/ecf39cde8a2d/12859_2016_976_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/8fb76f6dd817/12859_2016_976_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/8512b1ead6ac/12859_2016_976_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/589d5691988a/12859_2016_976_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/f3396b99bd74/12859_2016_976_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/16430828b7e7/12859_2016_976_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/6b1073a8339d/12859_2016_976_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/8b968a9f3252/12859_2016_976_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/b124d42ecaf5/12859_2016_976_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/ecf39cde8a2d/12859_2016_976_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/8fb76f6dd817/12859_2016_976_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/8512b1ead6ac/12859_2016_976_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/589d5691988a/12859_2016_976_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/f3396b99bd74/12859_2016_976_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/16430828b7e7/12859_2016_976_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c5/4787001/6b1073a8339d/12859_2016_976_Fig9_HTML.jpg

相似文献

1
Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data.Illumina错误概况:解析宏基因组测序数据中的精细尺度变异
BMC Bioinformatics. 2016 Mar 11;17:125. doi: 10.1186/s12859-016-0976-y.
2
Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.深入了解Illumina MiSeq平台进行扩增子测序时的偏差和测序错误。
Nucleic Acids Res. 2015 Mar 31;43(6):e37. doi: 10.1093/nar/gku1341. Epub 2015 Jan 13.
3
Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems.Illumina HiSeq 和基因组分析仪系统生成的基因组高通量测序数据评估。
Genome Biol. 2011 Nov 8;12(11):R112. doi: 10.1186/gb-2011-12-11-r112.
4
GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms.GC 偏倚影响基因组和宏基因组的重建,使 GC 含量低的生物代表性不足。
Gigascience. 2020 Feb 1;9(2). doi: 10.1093/gigascience/giaa008.
5
Joining Illumina paired-end reads for classifying phylogenetic marker sequences.将 Illumina 配对末端读取用于分类系统发育标记序列。
BMC Bioinformatics. 2020 Mar 14;21(1):105. doi: 10.1186/s12859-020-3445-6.
6
Identification and correction of systematic error in high-throughput sequence data.高通量测序数据中系统误差的识别与校正。
BMC Bioinformatics. 2011 Nov 21;12:451. doi: 10.1186/1471-2105-12-451.
7
Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing.基于 cPAS 的 BGISEQ-500 平台用于宏基因组测序的评估。
Gigascience. 2018 Mar 1;7(3):1-8. doi: 10.1093/gigascience/gix133.
8
A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform.一种新型的超高通量 16S rRNA 基因扩增子测序文库制备方法,适用于 Illumina HiSeq 平台。
Microbiome. 2017 Jul 6;5(1):68. doi: 10.1186/s40168-017-0279-1.
9
Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes.比较目前用于 Illumina 测序的细菌基因组和宏基因组文库制备试剂盒的测序偏倚。
DNA Res. 2019 Oct 1;26(5):391-398. doi: 10.1093/dnares/dsz017.
10
Library Preparation and Sequencing Platform Introduce Bias in Metagenomic-Based Characterizations of Microbiomes.文库制备和测序平台会导致基于宏基因组学的微生物组特征分析产生偏差。
Microbiol Spectr. 2022 Apr 27;10(2):e0009022. doi: 10.1128/spectrum.00090-22. Epub 2022 Mar 15.

引用本文的文献

1
Off-target sequence variations driven by the intrinsic properties of the Cas-sgRNA-DNA complex in genome editing.基因组编辑中由Cas-sgRNA-DNA复合物的内在特性驱动的脱靶序列变异。
PLoS One. 2025 Jul 18;20(7):e0328905. doi: 10.1371/journal.pone.0328905. eCollection 2025.
2
Swiftly identifying strongly unique k-mers.快速识别高度独特的k-mer序列。
Algorithms Mol Biol. 2025 Jul 13;20(1):13. doi: 10.1186/s13015-025-00286-6.
3
Rapid Emergence and Evolution of SARS-CoV-2 Intrahost Variants among COVID-19 Patients with Prolonged Infections, Singapore.

本文引用的文献

1
Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.深入了解Illumina MiSeq平台进行扩增子测序时的偏差和测序错误。
Nucleic Acids Res. 2015 Mar 31;43(6):e37. doi: 10.1093/nar/gku1341. Epub 2015 Jan 13.
2
DNA polymerases drive DNA sequencing-by-synthesis technologies: both past and present.DNA 聚合酶推动了 DNA 合成测序技术:过去和现在都是如此。
Front Microbiol. 2014 Jun 24;5:305. doi: 10.3389/fmicb.2014.00305. eCollection 2014.
3
Discovering motifs that induce sequencing errors.发现诱导测序错误的模体。
新加坡新冠病毒长期感染者体内严重急性呼吸综合征冠状病毒2(SARS-CoV-2)病毒内变异株的快速出现与进化
Emerg Infect Dis. 2025 Aug;31(8):1537-1549. doi: 10.3201/eid3108.241419. Epub 2025 Jul 1.
4
BonoboFlow: viral genome assembly and haplotype reconstruction from nanopore reads.倭黑猩猩流程:基于纳米孔测序 reads 的病毒基因组组装与单倍型重建
Bioinform Adv. 2025 May 13;5(1):vbaf115. doi: 10.1093/bioadv/vbaf115. eCollection 2025.
5
Preservation of milk in liquid nitrogen during sample collection does not affect the RNA quality for RNA-seq analysis.在样本采集过程中将牛奶保存在液氮中不会影响用于RNA测序分析的RNA质量。
BMC Genomics. 2025 May 24;26(1):525. doi: 10.1186/s12864-025-11707-6.
6
SARS-CoV-2 biological clones are genetically heterogeneous and include clade-discordant residues.严重急性呼吸综合征冠状病毒2(SARS-CoV-2)生物克隆在基因上是异质的,并且包含进化枝不一致的残基。
J Virol. 2025 May 20;99(5):e0225024. doi: 10.1128/jvi.02250-24. Epub 2025 Apr 24.
7
On the diversity, phylogeny and biogeography of cable bacteria.论丝状菌的多样性、系统发育和生物地理学。
Front Microbiol. 2024 Nov 19;15:1485281. doi: 10.3389/fmicb.2024.1485281. eCollection 2024.
8
Cancer liquid biopsies by Oxford Nanopore Technologies sequencing of cell-free DNA: from basic research to clinical applications.牛津纳米孔技术对游离 DNA 进行测序的癌症液体活检:从基础研究到临床应用。
Mol Cancer. 2024 Nov 29;23(1):265. doi: 10.1186/s12943-024-02178-6.
9
The quality and detection limits of mitochondrial heteroplasmy by long read nanopore sequencing.长读纳米孔测序检测线粒体异质性的质量和检测限。
Sci Rep. 2024 Nov 5;14(1):26778. doi: 10.1038/s41598-024-78270-0.
10
Mosaic of Somatic Mutations in Earth's Oldest Living Organism, Pando.地球上最古老的现存生物潘多的体细胞突变镶嵌现象。
bioRxiv. 2024 Oct 24:2024.10.19.619233. doi: 10.1101/2024.10.19.619233.
BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S1. doi: 10.1186/1471-2105-14-S5-S1. Epub 2013 Apr 10.
4
The history and advances of reversible terminators used in new generations of sequencing technology.新一代测序技术中使用的可逆终止子的历史和进展。
Genomics Proteomics Bioinformatics. 2013 Feb;11(1):34-40. doi: 10.1016/j.gpb.2013.01.003. Epub 2013 Jan 23.
5
Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities.利用古菌和细菌合成群落进行比较宏基因组和 rRNA 微生物多样性特征分析。
Environ Microbiol. 2013 Jun;15(6):1882-99. doi: 10.1111/1462-2920.12086. Epub 2013 Feb 6.
6
SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.SPAdes:一种新的基因组组装算法及其在单细胞测序中的应用
J Comput Biol. 2012 May;19(5):455-77. doi: 10.1089/cmb.2012.0021. Epub 2012 Apr 16.
7
Insertion site preference of Mu, Tn5, and Tn7 transposons.Mu、Tn5 和 Tn7 转座子的插入位点偏好。
Mob DNA. 2012 Feb 7;3(1):3. doi: 10.1186/1759-8753-3-3.
8
Identification and correction of systematic error in high-throughput sequence data.高通量测序数据中系统误差的识别与校正。
BMC Bioinformatics. 2011 Nov 21;12:451. doi: 10.1186/1471-2105-12-451.
9
Preparation of high-quality next-generation sequencing libraries from picogram quantities of target DNA.从皮克数量的目标 DNA 制备高质量的下一代测序文库。
Genome Res. 2012 Jan;22(1):125-33. doi: 10.1101/gr.124016.111. Epub 2011 Nov 16.
10
Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems.Illumina HiSeq 和基因组分析仪系统生成的基因组高通量测序数据评估。
Genome Biol. 2011 Nov 8;12(11):R112. doi: 10.1186/gb-2011-12-11-r112.