Illumina平台深度测序错误的可重复性使得能够准确测定细胞中的DNA条形码。

Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells.

作者信息

Beltman Joost B, Urbanus Jos, Velds Arno, van Rooij Nienke, Rohr Jan C, Naik Shalin H, Schumacher Ton N

机构信息

Division of Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.

Division of Toxicology, Leiden Academic Centre for Drug Research, Leiden University, 2333 CC, Leiden, The Netherlands.

出版信息

BMC Bioinformatics. 2016 Apr 2;17:151. doi: 10.1186/s12859-016-0999-4.

DOI:10.1186/s12859-016-0999-4

PMID:27038897

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4818877/

Abstract

BACKGROUND

Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags.

RESULTS

Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences.

CONCLUSIONS

Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets.

摘要

背景

扩增DNA的下一代测序（NGS）是一种强大的工具，可用于描述细胞群体内的遗传异质性，既能用于研究细胞群体的克隆结构，也能用于进行遗传谱系追踪。对于丰富序列和稀有序列在生物学上均相关的应用而言，NGS技术相对较高的错误率使数据分析变得复杂，因为很难将由PCR或测序错误产生的假序列与稀有的真实序列区分开来。例如，这个问题适用于细胞条形码策略，该策略旨在通过为单个细胞提供独特的可遗传DNA标签来追踪单细胞后代的数量和类型。

结果

在这里，我们使用来自Illumina HiSeq平台的遗传条形码数据表明，基于直接读取阈值的数据过滤通常不足以滤除假条形码。重要的是，我们证明特定的测序错误在并行测序的不同样本中以大致恒定的速率出现。我们利用这一观察结果开发了一种新方法来滤除假序列。

结论

我们新方法的应用证明了其在生物数据集中从假序列中识别真实序列的价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0988/4818877/a4cd60b97936/12859_2016_999_Fig1_HTML.jpg

相似文献

Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells.

BMC Bioinformatics. 2016 Apr 2;17:151. doi: 10.1186/s12859-016-0999-4.

Primer ID Validates Template Sampling Depth and Greatly Reduces the Error Rate of Next-Generation Sequencing of HIV-1 Genomic RNA Populations.

J Virol. 2015 Aug;89(16):8540-55. doi: 10.1128/JVI.00522-15. Epub 2015 Jun 3.

Barcode-free next-generation sequencing error validation for ultra-rare variant detection.

Nat Commun. 2019 Feb 28;10(1):977. doi: 10.1038/s41467-019-08941-4.

Illumina midi-barcodes: quality proof and applications.

Mitochondrial DNA A DNA Mapp Seq Anal. 2019 Apr;30(3):490-499. doi: 10.1080/24701394.2018.1551386. Epub 2019 Jan 11.

Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform.

Sci Rep. 2015 Apr 17;5:9687. doi: 10.1038/srep09687.

Application of Stochastic Labeling with Random-Sequence Barcodes for Simultaneous Quantification and Sequencing of Environmental 16S rRNA Genes.

PLoS One. 2017 Jan 4;12(1):e0169431. doi: 10.1371/journal.pone.0169431. eCollection 2017.

Indel-correcting DNA barcodes for high-throughput sequencing.

Proc Natl Acad Sci U S A. 2018 Jul 3;115(27):E6217-E6226. doi: 10.1073/pnas.1802640115. Epub 2018 Jun 20.

DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses.

BMC Res Notes. 2016 May 3;9:255. doi: 10.1186/s13104-016-2064-9.

A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform.

Microbiome. 2017 Jul 6;5(1):68. doi: 10.1186/s40168-017-0279-1.

Insertion and deletion correcting DNA barcodes based on watermarks.

BMC Bioinformatics. 2015 Feb 18;16:50. doi: 10.1186/s12859-015-0482-7.

引用本文的文献

Target sequence of single cells captured by a polymeric microfluidic device.

Sci Rep. 2025 Aug 11;15(1):29306. doi: 10.1038/s41598-025-14826-y.

Extracting, filtering and simulating cellular barcodes using CellBarcode tools.

Nat Comput Sci. 2024 Feb;4(2):128-143. doi: 10.1038/s43588-024-00595-7. Epub 2024 Feb 19.

Clonal barcoding with qPCR detection enables live cell functional analyses for cancer research.

Nat Commun. 2022 Jul 4;13(1):3837. doi: 10.1038/s41467-022-31536-5.

ngsComposer: an automated pipeline for empirically based NGS data quality filtering.

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab092.

Lineage barcoding in mice with homing CRISPR.

Nat Protoc. 2021 Apr;16(4):2088-2108. doi: 10.1038/s41596-020-00485-y. Epub 2021 Mar 10.

A committed tissue-resident memory T cell precursor within the circulating CD8+ effector T cell pool.

J Exp Med. 2020 Oct 5;217(10). doi: 10.1084/jem.20191711.

Targeting enhancer switching overcomes non-genetic drug resistance in acute myeloid leukaemia.

Nat Commun. 2019 Jun 20;10(1):2723. doi: 10.1038/s41467-019-10652-9.

Systematic evaluation of error rates and causes in short samples in next-generation sequencing.

Sci Rep. 2018 Jul 19;8(1):10950. doi: 10.1038/s41598-018-29325-6.

Heritable tumor cell division rate heterogeneity induces clonal dominance.

PLoS Comput Biol. 2018 Feb 12;14(2):e1005954. doi: 10.1371/journal.pcbi.1005954. eCollection 2018 Feb.

Limitations and challenges of genetic barcode quantification.

Sci Rep. 2017 Mar 3;7:43249. doi: 10.1038/srep43249.

本文引用的文献

Starcode: sequence clustering based on all-pairs search.

Bioinformatics. 2015 Jun 15;31(12):1913-9. doi: 10.1093/bioinformatics/btv053. Epub 2015 Jan 31.

Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence.

Nucleic Acids Res. 2014;42(16):e129. doi: 10.1093/nar/gku607. Epub 2014 Jul 10.

In vivo generation of DNA sequence diversity for cellular barcoding.

Nucleic Acids Res. 2014;42(16):e127. doi: 10.1093/nar/gku604. Epub 2014 Jul 10.

Cellular barcoding: a technical appraisal.

Exp Hematol. 2014 Aug;42(8):598-608. doi: 10.1016/j.exphem.2014.05.003. Epub 2014 Jul 1.

Lentiviral and targeted cellular barcoding reveals ongoing clonal dynamics of cell lines in vitro and in vivo.

Genome Biol. 2014 May 30;15(5):R75. doi: 10.1186/gb-2014-15-5-r75.

Towards error-free profiling of immune repertoires.

Nat Methods. 2014 Jun;11(6):653-5. doi: 10.1038/nmeth.2960. Epub 2014 May 4.

Asymmetry in skeletal distribution of mouse hematopoietic stem cell clones and their equilibration by mobilizing cytokines.

J Exp Med. 2014 Mar 10;211(3):487-97. doi: 10.1084/jem.20131804. Epub 2014 Feb 24.

Clonal analysis via barcoding reveals diverse growth and differentiation of transplanted mouse and human mammary stem cells.

Cell Stem Cell. 2014 Feb 6;14(2):253-63. doi: 10.1016/j.stem.2013.12.011. Epub 2014 Jan 16.

Next-generation sequence assembly: four stages of data processing and computational challenges.

PLoS Comput Biol. 2013;9(12):e1003345. doi: 10.1371/journal.pcbi.1003345. Epub 2013 Dec 12.

High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing.

Proc Natl Acad Sci U S A. 2013 Dec 3;110(49):19872-7. doi: 10.1073/pnas.1319590110. Epub 2013 Nov 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Illumina平台深度测序错误的可重复性使得能够准确测定细胞中的DNA条形码。

Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献