Taheri Ghahfarokhi Seyed Mohammad Amin, Peña-Castillo Lourdes
Department of Computer Science, Memorial University of Newfoundland, St. John's, Newfoundland A1B 3X5, Canada.
Department of Biology, Memorial University of Newfoundland, St. John's, Newfoundland A1B 3X9, Canada.
NAR Genom Bioinform. 2025 Mar 8;7(1):lqaf016. doi: 10.1093/nargab/lqaf016. eCollection 2025 Mar.
A terminator is a DNA region that ends the transcription process. Currently, multiple computational tools are available for predicting bacterial terminators. However, these methods are specialized for certain bacteria or terminator type (i.e. intrinsic or factor-dependent). In this work, we developed BacTermFinder using an ensemble of convolutional neural networks (CNNs) receiving as input four different representations of terminator sequences. To develop BacTermFinder, we collected roughly 41 000 bacterial terminators (intrinsic and factor-dependent) of 22 species with varying GC-content (from 28% to 71%) from published studies that used RNA-seq technologies. We evaluated BacTermFinder's performance on terminators of five bacterial species (not used for training BacTermFinder) and two archaeal species. BacTermFinder's performance was compared with that of four other bacterial terminator prediction tools. Based on our results, BacTermFinder outperforms all other four approaches in terms of average recall without increasing the number of false positives. Moreover, BacTermFinder identifies both types of terminators (intrinsic and factor-dependent) and generalizes to archaeal terminators. Additionally, we visualized the saliency map of the CNNs to gain insights on terminator motif per species. BacTermFinder is publicly available at https://github.com/BioinformaticsLabAtMUN/BacTermFinder.
终止子是一个结束转录过程的DNA区域。目前,有多种计算工具可用于预测细菌终止子。然而,这些方法专门针对某些细菌或终止子类型(即固有型或因子依赖性)。在这项工作中,我们开发了BacTermFinder,它使用卷积神经网络(CNN)集成,将终止子序列的四种不同表示作为输入。为了开发BacTermFinder,我们从使用RNA测序技术的已发表研究中收集了大约41000个来自22个物种的细菌终止子(固有型和因子依赖性),其GC含量各不相同(从28%到71%)。我们评估了BacTermFinder在五种细菌物种(未用于训练BacTermFinder)和两种古细菌物种的终止子上的性能。将BacTermFinder的性能与其他四种细菌终止子预测工具的性能进行了比较。根据我们的结果,BacTermFinder在不增加假阳性数量的情况下,平均召回率优于其他所有四种方法。此外,BacTermFinder能够识别两种类型的终止子(固有型和因子依赖性),并推广到古细菌终止子。此外,我们可视化了CNN的显著性图,以深入了解每个物种的终止子基序。BacTermFinder可在https://github.com/BioinformaticsLabAtMUN/BacTermFinder上公开获取。