cONcat：从长牛津纳米孔测序读段中进行串联片段的计算重建。

cONcat: Computational reconstruction of concatenated fragments from long Oxford Nanopore reads.

作者信息

Petri Alexander J, Thi-Huyen Nguyen Mai, Rajwar Anjali, Benson Erik, Sahlin Kristoffer

机构信息

Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm, Sweden.

Department of Computer Science, University of Helsinki, Helsinki, Finland.

出版信息

PLoS One. 2025 Jul 24;20(7):e0321246. doi: 10.1371/journal.pone.0321246. eCollection 2025.

DOI:10.1371/journal.pone.0321246

PMID:40705736

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12289010/

Abstract

Synthetic combinatorial DNA libraries are widely used to produce protein variants, optimize binders, and for high-throughput studies of protein-DNA interactions. The libraries can be made by researchers or vendors, and high-throughput sequencing is used for both quality control and to study the outcome of selection experiments. Oxford nanopore sequencing (ONT) is well suited to this as it allows for long read lengths and can be done rapidly with low-cost instrumentation. However, it suffers from a lower overall read accuracy and an uneven error profile. No current bioinformatics tools are well-suited to the challenge of deducing the composition and order of constituent members of combinatorial libraries from ONT reads. We introduce cONcat, an algorithm to identify the makeup of concatenated DNA fragments in a set of ONT sequencing reads from a pool of known fragments. cONcat uses an edit distance-based recursive covering algorithm for finding the best possible matchings between the fragments and the reads. In our experiments on simulated and experimental data, cONcat accurately detects the correct fragment coverings given the short fragment sizes (< 20 bp) and the sequencing errors present in ONT reads. However, we find that the high error rates in the start of ONT reads make it challenging to get confident coverage there, inferring a need for experimental strategies to avoid key sequence information in the start of reads.

摘要

合成组合DNA文库被广泛用于产生蛋白质变体、优化结合物以及用于蛋白质 - DNA相互作用的高通量研究。这些文库可以由研究人员或供应商制备，高通量测序用于质量控制和研究选择实验的结果。牛津纳米孔测序（ONT）非常适合于此，因为它允许长读长，并且可以使用低成本仪器快速完成。然而，它的总体读取准确性较低，错误分布不均匀。目前没有生物信息学工具非常适合从ONT读取中推断组合文库组成成员的组成和顺序这一挑战。我们引入了cONcat，一种从一组已知片段池中识别ONT测序读取中连接DNA片段组成的算法。cONcat使用基于编辑距离的递归覆盖算法来找到片段与读取之间的最佳匹配。在我们对模拟数据和实验数据的实验中，对于短片段大小（<20 bp）和ONT读取中存在的测序错误，cONcat能够准确检测到正确的片段覆盖。然而，我们发现ONT读取起始部分的高错误率使得在那里获得可靠的覆盖具有挑战性，这表明需要实验策略来避免读取起始部分的关键序列信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e5/12289010/fd7a446670f8/pone.0321246.g001.jpg

相似文献

cONcat: Computational reconstruction of concatenated fragments from long Oxford Nanopore reads.

PLoS One. 2025 Jul 24;20(7):e0321246. doi: 10.1371/journal.pone.0321246. eCollection 2025.

Optimizing fungal DNA extraction and purification for Oxford Nanopore untargeted shotgun metagenomic sequencing from simulated hemoculture specimens.

mSystems. 2025 Jun 17;10(6):e0116624. doi: 10.1128/msystems.01166-24. Epub 2025 Apr 8.

Comparison of Illumina and Oxford Nanopore Technology systems for the genomic characterization of .

Microbiol Spectr. 2025 Jul;13(7):e0129424. doi: 10.1128/spectrum.01294-24. Epub 2025 May 28.

Accurate and reproducible whole-genome genotyping for bacterial genomic surveillance with Nanopore sequencing data.

J Clin Microbiol. 2025 Jul 9;63(7):e0036925. doi: 10.1128/jcm.00369-25. Epub 2025 Jun 13.

Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.

Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.

Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.

Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.

Short-Term Memory Impairment

SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.

J Bioinform Comput Biol. 2024 Oct;22(5):2450022. doi: 10.1142/S0219720024500227. Epub 2024 Oct 1.

Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.

Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.

Antidepressants for pain management in adults with chronic pain: a network meta-analysis.

Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.

本文引用的文献

De novo clustering of large long-read transcriptome datasets with isONclust3.

Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf207.

Long oligos: direct chemical synthesis of genes with up to 1728 nucleotides.

Chem Sci. 2024 Dec 18;16(4):1966-1973. doi: 10.1039/d4sc06958g. eCollection 2025 Jan 22.

Sparks of function by de novo protein design.

Nat Biotechnol. 2024 Feb;42(2):203-215. doi: 10.1038/s41587-024-02133-2. Epub 2024 Feb 15.

isONform: reference-free transcriptome reconstruction from Oxford Nanopore data.

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i222-i231. doi: 10.1093/bioinformatics/btad264.

Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2.

Nat Commun. 2023 May 22;14(1):2940. doi: 10.1038/s41467-023-38553-y.

Strobealign: flexible seed size enables ultra-fast and accurate read alignment.

Genome Biol. 2022 Dec 15;23(1):260. doi: 10.1186/s13059-022-02831-7.

PBSIM3: a simulator for all types of PacBio and ONT long reads.

NAR Genom Bioinform. 2022 Dec 1;4(4):lqac092. doi: 10.1093/nargab/lqac092. eCollection 2022 Dec.

RATTLE: reference-free reconstruction and quantification of transcriptomes from Nanopore sequencing.

Genome Biol. 2022 Jul 8;23(1):153. doi: 10.1186/s13059-022-02715-w.

Phage display and other peptide display technologies.

FEMS Microbiol Rev. 2022 Mar 3;46(2). doi: 10.1093/femsre/fuab052.

Sequencing DNA with nanopores: Troubles and biases.

PLoS One. 2021 Oct 1;16(10):e0257521. doi: 10.1371/journal.pone.0257521. eCollection 2021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

cONcat：从长牛津纳米孔测序读段中进行串联片段的计算重建。

cONcat: Computational reconstruction of concatenated fragments from long Oxford Nanopore reads.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献