Bonenfant Quentin, Noé Laurent, Touzet Hélène
Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL-Centre de Recherche en Informatique Signal et Automatique de Lille, Lille F-59000, France.
Bioinform Adv. 2022 Nov 21;3(1):vbac085. doi: 10.1093/bioadv/vbac085. eCollection 2023.
Oxford Nanopore Technologies (ONT) sequencing has become very popular over the past few years and offers a cost-effective solution for many genomic and transcriptomic projects. One distinctive feature of the technology is that the protocol includes the ligation of adapters to both ends of each fragment. Those adapters should then be removed before downstream analyses, either during the basecalling step or by explicit trimming. This basic task may be tricky when the definition of the adapter sequence is not well documented.
We have developed a new method to scan a set of ONT reads to see if it contains adapters, without any prior knowledge on the sequence of the potential adapters, and then trim out those adapters. The algorithm is based on approximate -mers and is able to discover adapter sequences based on their frequency alone. The method was successfully tested on a variety of ONT datasets with different flowcells, sequencing kits and basecallers.
The resulting software, named Porechop_ABI, is open-source and is available at https://github.com/bonsai-team/Porechop_ABI.
Supplementary data are available at online.
在过去几年中,牛津纳米孔技术(ONT)测序变得非常流行,并为许多基因组和转录组项目提供了一种经济高效的解决方案。该技术的一个显著特点是,实验方案包括在每个片段的两端连接接头。在进行下游分析之前,应该在碱基识别步骤中或通过显式修剪去除这些接头。当接头序列的定义记录不完善时,这项基本任务可能会很棘手。
我们开发了一种新方法,用于扫描一组ONT读数,以查看其中是否包含接头,而无需事先了解潜在接头的序列,然后将这些接头修剪掉。该算法基于近似k-mer,并且能够仅根据接头的频率发现接头序列。该方法已在具有不同流动槽、测序试剂盒和碱基识别器的各种ONT数据集上成功测试。
所得软件名为Porechop_ABI,是开源的,可在https://github.com/bonsai-team/Porechop_ABI获得。
补充数据可在网上获取。