School of Life Sciences, University of Nottingham, Nottingham, UK.
DeepSeq, School of Life Sciences, University of Nottingham, Nottingham, UK.
Bioinformatics. 2019 Jul 1;35(13):2193-2198. doi: 10.1093/bioinformatics/bty841.
The Oxford Nanopore Technologies (ONT) MinION is used for sequencing a wide variety of sample types with diverse methods of sample extraction. Nanopore sequencers output FAST5 files containing signal data subsequently base called to FASTQ format. Optionally, ONT devices can collect data from all sequencing channels simultaneously in a bulk FAST5 file enabling inspection of signal in any channel at any point. We sought to visualize this signal to inspect challenging or difficult to sequence samples.
The BulkVis tool can load a bulk FAST5 file and overlays MinKNOW (the software that controls ONT sequencers) classifications on the signal trace and can show mappings to a reference. Users can navigate to a channel and time or, given a FASTQ header from a read, jump to its specific position. BulkVis can export regions as Nanopore base caller compatible reads. Using BulkVis, we find long reads can be incorrectly divided by MinKNOW resulting in single DNA molecules being split into two or more reads. The longest seen to date is 2 272 580 bases in length and reported in eleven consecutive reads. We provide helper scripts that identify and reconstruct split reads given a sequencing summary file and alignment to a reference. We note that incorrect read splitting appears to vary according to input sample type and is more common in 'ultra-long' read preparations.
The software is available freely under an MIT license at https://github.com/LooseLab/bulkvis.
Supplementary data are available at Bioinformatics online.
牛津纳米孔技术(ONT)的 MinION 用于对具有不同样本提取方法的各种样本类型进行测序。纳米孔测序仪输出包含信号数据的 FAST5 文件,随后将其称为 FASTQ 格式。可选地,ONT 设备可以同时在一个大容量 FAST5 文件中收集所有测序通道的数据,从而可以在任何时间检查任何通道的信号。我们试图可视化此信号,以检查具有挑战性或难以测序的样本。
BulkVis 工具可以加载大容量 FAST5 文件,并将 MinKNOW(控制 ONT 测序仪的软件)分类覆盖在信号轨迹上,并可以显示到参考的映射。用户可以导航到通道和时间,或者给定读的 FASTQ 头,跳转到其特定位置。BulkVis 可以将区域导出为 Nanopore 碱基调用兼容的读取。使用 BulkVis,我们发现 MinKNOW 可能会错误地分割长读取,导致单个 DNA 分子被分割成两个或更多读取。迄今为止最长的是 2272580 个碱基,在 11 个连续的读取中报告。我们提供了一些辅助脚本,可以根据测序摘要文件和与参考的比对,识别和重建分割读取。我们注意到,不正确的读取分割似乎根据输入样本类型而有所不同,并且在“超长”读取制备中更为常见。
该软件可根据麻省理工学院的许可证免费获得,网址为 https://github.com/LooseLab/bulkvis。
补充数据可在 Bioinformatics 在线获得。