Suppr超能文献

使用PRO为纳米孔测序数据生成条形码。

Generating barcodes for nanopore sequencing data with PRO.

作者信息

Yu Ting, Ren Zitong, Gao Xin, Li Guojun, Han Renmin

机构信息

Research Center for Mathematics and Interdisciplinary Sciences, Frontiers Science Center for Nonlinear Expectations (Ministry of Education), Shandong University, Shandong 266000, China.

Computer, Electrical and Mathematical Sciences and Engineering Division & Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia.

出版信息

Fundam Res. 2024 Apr 25;4(4):785-794. doi: 10.1016/j.fmre.2024.04.014. eCollection 2024 Jul.

Abstract

DNA barcodes, short and unique DNA sequences, play a crucial role in sample identification when processing many samples simultaneously, which helps reduce experimental costs. Nevertheless, the low quality of long-read sequencing makes it difficult to identify barcodes accurately, which poses significant challenges for the design of barcodes for large numbers of samples in a single sequencing run. Here, we present a comprehensive study of the generation of barcodes and develop a tool, PRO, that can be used for selecting optimal barcode sets and demultiplexing. We formulate the barcode design problem as a combinatorial problem and prove that finding the optimal largest barcode set in a given DNA sequence space in which all sequences have the same length is theoretically NP-complete. For practical applications, we developed the novel method PRO by introducing the probability divergence between two DNA sequences to expand the capacity of barcode kits while ensuring demultiplexing accuracy. Specifically, the maximum size of the barcode kits designed by PRO is 2,292, which keeps the length of barcodes the same as that of the official ones used by Oxford Nanopore Technologies (ONT). We validated the performance of PRO on a simulated nanopore dataset with high error rates. The demultiplexing accuracy of PRO reached 98.29% for a barcode kit of size 2,922, 4.31% higher than that of Guppy, the official demultiplexing tool. When the size of the barcode kit generated by PRO is the same as the official size provided by ONT, both tools show superior and comparable demultiplexing accuracy.

摘要

DNA条形码是短而独特的DNA序列,在同时处理多个样本时,对样本识别起着至关重要的作用,这有助于降低实验成本。然而,长读长测序的低质量使得准确识别条形码变得困难,这给在单次测序运行中为大量样本设计条形码带来了重大挑战。在此,我们对条形码的生成进行了全面研究,并开发了一种工具PRO,可用于选择最佳条形码集和解复用。我们将条形码设计问题表述为一个组合问题,并证明在给定的所有序列长度相同的DNA序列空间中找到最优的最大条形码集在理论上是NP完全问题。对于实际应用,我们通过引入两个DNA序列之间的概率差异来开发新方法PRO,以扩大条形码试剂盒的容量,同时确保解复用准确性。具体而言,PRO设计的条形码试剂盒的最大大小为2292,其条形码长度与牛津纳米孔技术公司(ONT)使用的官方条形码长度相同。我们在具有高错误率的模拟纳米孔数据集上验证了PRO的性能。对于大小为2922的条形码试剂盒,PRO的解复用准确率达到了98.29%,比官方解复用工具Guppy高出4.31%。当PRO生成的条形码试剂盒大小与ONT提供的官方大小相同时,两种工具都显示出卓越且相当的解复用准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b35a/11630701/e351376d9ee0/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验