Suppr超能文献

cONcat:从长牛津纳米孔测序读段中进行串联片段的计算重建。

cONcat: Computational reconstruction of concatenated fragments from long Oxford Nanopore reads.

作者信息

Petri Alexander J, Thi-Huyen Nguyen Mai, Rajwar Anjali, Benson Erik, Sahlin Kristoffer

机构信息

Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm, Sweden.

Department of Computer Science, University of Helsinki, Helsinki, Finland.

出版信息

PLoS One. 2025 Jul 24;20(7):e0321246. doi: 10.1371/journal.pone.0321246. eCollection 2025.

Abstract

Synthetic combinatorial DNA libraries are widely used to produce protein variants, optimize binders, and for high-throughput studies of protein-DNA interactions. The libraries can be made by researchers or vendors, and high-throughput sequencing is used for both quality control and to study the outcome of selection experiments. Oxford nanopore sequencing (ONT) is well suited to this as it allows for long read lengths and can be done rapidly with low-cost instrumentation. However, it suffers from a lower overall read accuracy and an uneven error profile. No current bioinformatics tools are well-suited to the challenge of deducing the composition and order of constituent members of combinatorial libraries from ONT reads. We introduce cONcat, an algorithm to identify the makeup of concatenated DNA fragments in a set of ONT sequencing reads from a pool of known fragments. cONcat uses an edit distance-based recursive covering algorithm for finding the best possible matchings between the fragments and the reads. In our experiments on simulated and experimental data, cONcat accurately detects the correct fragment coverings given the short fragment sizes (< 20 bp) and the sequencing errors present in ONT reads. However, we find that the high error rates in the start of ONT reads make it challenging to get confident coverage there, inferring a need for experimental strategies to avoid key sequence information in the start of reads.

摘要

合成组合DNA文库被广泛用于产生蛋白质变体、优化结合物以及用于蛋白质 - DNA相互作用的高通量研究。这些文库可以由研究人员或供应商制备,高通量测序用于质量控制和研究选择实验的结果。牛津纳米孔测序(ONT)非常适合于此,因为它允许长读长,并且可以使用低成本仪器快速完成。然而,它的总体读取准确性较低,错误分布不均匀。目前没有生物信息学工具非常适合从ONT读取中推断组合文库组成成员的组成和顺序这一挑战。我们引入了cONcat,一种从一组已知片段池中识别ONT测序读取中连接DNA片段组成的算法。cONcat使用基于编辑距离的递归覆盖算法来找到片段与读取之间的最佳匹配。在我们对模拟数据和实验数据的实验中,对于短片段大小(<20 bp)和ONT读取中存在的测序错误,cONcat能够准确检测到正确的片段覆盖。然而,我们发现ONT读取起始部分的高错误率使得在那里获得可靠的覆盖具有挑战性,这表明需要实验策略来避免读取起始部分的关键序列信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e5/12289010/fd7a446670f8/pone.0321246.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验