DNBSEQ数据中软件读取交叉污染的分析

Analysis of Software Read Cross-Contamination in DNBSEQ Data.

作者信息

Konanov Dmitry N, Tereshchuk Vera Y, Sonets Ignat V, Korneenko Elena V, Lukina-Gronskaya Aleksandra V, Speranskaya Anna S, Ilina Elena N

机构信息

Research Institute for System Biology and Medicine, Moscow 117246, Russia.

Phystech School of Biological and Medical Physics of MIPT, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia.

出版信息

Biology (Basel). 2025 Jun 9;14(6):670. doi: 10.3390/biology14060670.

DOI:10.3390/biology14060670

PMID:40563921

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12189395/

Abstract

DNA nanoball sequencing (DNBSEQ) is one of the most rapidly developing sequencing technologies and is widely applied in genomic and transcriptomic investigations. Recently, a new PE300 sequencing option primarily recommended for amplicon analysis was released for DNBSEQ-G99 and G400 devices. Given their unprecedentedly high data yield per flow cell, the new PE300 kits could be a great choice for various sequencing tasks, but we found that combining different types of DNA libraries in a single run could lead to undesired artifacts in the data. In this study, we investigate the occasional read cross-contamination that we first observed in our DNBSEQ PE300 run. The phenomenon, which we refer to as "software contamination", is not actual contamination but primarily manifests as improper forward/reverse read pairing, improper demultiplexing, or as "digital chimeric" reads. Although rare, these artifacts were found in all runs we have analyzed, including several MGI demo datasets (both PE100 and PE150). In this study, we demonstrate that these artifacts arise primarily from the incorrect resolution of sequencing signals produced by neighboring DNA nanoballs, leading to mixing out forward and reverse reads or improper demultiplexing. The artifacts occur most frequently with read pairs where the length of insert sequence is shorter than the read length. Based on a few external NA12878 human exome sequencing data, we conclude that the total improper pairing rate in DNBSEQ data is comparable to Illumina ones. Overall, the problem only affects the analysis results when simultaneously sequenced libraries have markedly different insert size distribution or flow cell loading. Additionally, we demonstrate here that raw DNBSEQ data might contain ~2% optical duplicates, resulting from the same effect of close neighboring of DNB-sites in the flow cell.

摘要

DNA纳米球测序（DNBSEQ）是发展最为迅速的测序技术之一，广泛应用于基因组和转录组研究。最近，一种主要推荐用于扩增子分析的新PE300测序选项已发布，可用于DNBSEQ - G99和G400设备。鉴于其每个流动槽前所未有的高数据产量，新的PE300试剂盒可能是各种测序任务的理想选择，但我们发现，在一次运行中合并不同类型的DNA文库可能会导致数据中出现不期望的伪像。在本研究中，我们调查了在DNBSEQ PE300运行中首次观察到的偶尔的读段交叉污染情况。我们将这种现象称为“软件污染”，它并非实际污染，主要表现为正向/反向读段配对不当、解复用不当或“数字嵌合”读段。尽管这种伪像很少见，但在我们分析的所有运行中都有发现，包括几个华大基因演示数据集（PE100和PE150）。在本研究中，我们证明这些伪像主要源于相邻DNA纳米球产生的测序信号分辨率不正确，导致正向和反向读段混淆或解复用不当。当插入序列长度短于读段长度时，伪像在配对读段中出现得最为频繁。基于一些外部NA12878人类外显子测序数据，我们得出结论，DNBSEQ数据中的总配对不当率与Illumina数据相当。总体而言，只有当同时测序的文库具有明显不同的插入片段大小分布或流动槽加载情况时，这个问题才会影响分析结果。此外，我们在此证明，原始DNBSEQ数据可能包含约2%的光学重复序列，这是由于流动槽中DNB位点紧密相邻的相同效应导致的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180f/12189395/2d8976b63a9a/biology-14-00670-g001.jpg

相似文献

Analysis of Software Read Cross-Contamination in DNBSEQ Data.

Biology (Basel). 2025 Jun 9;14(6):670. doi: 10.3390/biology14060670.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

Antidepressants for pain management in adults with chronic pain: a network meta-analysis.

Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.

Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.

Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Home treatment for mental health problems: a systematic review.

Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

Systemic treatments for metastatic cutaneous melanoma.

Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

本文引用的文献

Genetic variation in patent foramen ovale: a case-control genome-wide association study.

Front Genet. 2025 Jan 13;15:1523304. doi: 10.3389/fgene.2024.1523304. eCollection 2024.

Pollen products collected from honey bee hives experiencing minor stress have altered fungal communities and reduced antimicrobial properties.

FEMS Microbiol Ecol. 2024 Jun 17;100(7). doi: 10.1093/femsec/fiae091.

Whole-Exome Sequencing and Analysis of the T Cell Receptor β and γ Repertoires in Rheumatoid Arthritis.

Diagnostics (Basel). 2024 Mar 1;14(5):529. doi: 10.3390/diagnostics14050529.

Comparative transcriptomic analysis of Illumina and MGI next-generation sequencing platforms using RUNX3- and ZBTB46-instructed embryonic stem cells.

Front Genet. 2024 Jan 5;14:1275383. doi: 10.3389/fgene.2023.1275383. eCollection 2023.

Comparison of capture-based mtDNA sequencing performance between MGI and illumina sequencing platforms in various sample types.

BMC Genomics. 2024 Jan 8;25(1):41. doi: 10.1186/s12864-023-09938-6.

Unique antimicrobial activity in honey from the Australian honeypot ant ().

PeerJ. 2023 Jul 26;11:e15645. doi: 10.7717/peerj.15645. eCollection 2023.

Highly comparable metabarcoding results from MGI-Tech and Illumina sequencing platforms.

PeerJ. 2021 Sep 30;9:e12254. doi: 10.7717/peerj.12254. eCollection 2021.

Comparison between MGI and Illumina sequencing platforms for whole genome sequencing.

Genes Genomics. 2021 Jul;43(7):713-724. doi: 10.1007/s13258-021-01096-x. Epub 2021 Apr 17.

SomaticCombiner: improving the performance of somatic variant calling based on evaluation tests and a consensus approach.

Sci Rep. 2020 Jul 30;10(1):12898. doi: 10.1038/s41598-020-69772-8.

Comparative analysis of novel MGISEQ-2000 sequencing platform vs Illumina HiSeq 2500 for whole-genome sequencing.

PLoS One. 2020 Mar 16;15(3):e0230301. doi: 10.1371/journal.pone.0230301. eCollection 2020.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DNBSEQ数据中软件读取交叉污染的分析

Analysis of Software Read Cross-Contamination in DNBSEQ Data.

作者信息

Konanov Dmitry N, Tereshchuk Vera Y, Sonets Ignat V, Korneenko Elena V, Lukina-Gronskaya Aleksandra V, Speranskaya Anna S, Ilina Elena N

机构信息

Research Institute for System Biology and Medicine, Moscow 117246, Russia.

Phystech School of Biological and Medical Physics of MIPT, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia.

出版信息

Biology (Basel). 2025 Jun 9;14(6):670. doi: 10.3390/biology14060670.

DOI:10.3390/biology14060670

PMID:40563921

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12189395/

Abstract

摘要

DNBSEQ数据中软件读取交叉污染的分析

Analysis of Software Read Cross-Contamination in DNBSEQ Data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

DNBSEQ数据中软件读取交叉污染的分析

Analysis of Software Read Cross-Contamination in DNBSEQ Data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献