Suppr超能文献

用于 Oxford Nanopore 测序的碱基调用工具的核苷酸重建质量符号估计。

Estimated Nucleotide Reconstruction Quality Symbols of Basecalling Tools for Oxford Nanopore Sequencing.

机构信息

Institute of Computer Science, Warsaw University of Technology, 00-661 Warsaw, Poland.

出版信息

Sensors (Basel). 2023 Jul 29;23(15):6787. doi: 10.3390/s23156787.

Abstract

Currently, one of the fastest-growing DNA sequencing technologies is nanopore sequencing. One of the key stages involved in processing sequencer data is the basecalling process, where the input sequence of currents measured on the nanopores of the sequencer reproduces the DNA sequences, called DNA reads. Many of the applications dedicated to basecalling, together with the DNA sequence, provide the estimated quality of the reconstruction of a given nucleotide (quality symbols are contained on every fourth line of the FASTQ file; each nucleotide in the FASTQ file corresponds to exactly one estimated nucleotide reconstruction quality symbol). Herein, we compare the estimated nucleotide reconstruction quality symbols (signs from every fourth line of the FASTQ file) reported by other basecallers. The conducted experiments consisted of basecalling the same raw datasets from the nanopore device by other basecallers and comparing the provided quality symbols, denoting the estimated quality of the nucleotide reconstruction. The results show that the estimated quality reported by different basecallers may vary, depending on the tool used, particularly in terms of range and distribution. Moreover, we mapped basecalled DNA reads to reference genomes and calculated matched and mismatched rates for groups of nucleotides with the same quality symbol. Finally, the presented paper shows that the estimated nucleotide reconstruction quality reported in the basecalling process is not used in any investigated tool for processing nanopore DNA reads.

摘要

目前,增长最快的 DNA 测序技术之一是纳米孔测序。在处理测序仪数据的关键阶段之一是碱基调用过程,其中输入序列的电流在测序仪的纳米孔上进行测量,从而再现 DNA 序列,称为 DNA 读取。许多专门用于碱基调用的应用程序与 DNA 序列一起提供了给定核苷酸重建质量的估计(质量符号包含在 FASTQ 文件的每第四行中;FASTQ 文件中的每个核苷酸都对应于一个准确估计的核苷酸重建质量符号)。在此,我们比较了其他碱基调用器报告的估计核苷酸重建质量符号(FASTQ 文件每第四行的符号)。进行的实验包括通过其他碱基调用器对来自纳米孔设备的相同原始数据集进行碱基调用,并比较所提供的质量符号,该符号表示核苷酸重建的估计质量。结果表明,不同碱基调用器报告的估计质量可能因所使用的工具而异,特别是在范围和分布方面。此外,我们将碱基调用的 DNA 读取映射到参考基因组,并计算了具有相同质量符号的核苷酸组的匹配和不匹配率。最后,本文表明,碱基调用过程中报告的估计核苷酸重建质量未在任何用于处理纳米孔 DNA 读取的工具中使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86be/10422362/741da3b72cef/sensors-23-06787-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验