Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Colombia.
Center for Technology Development Bioprocess and Agroindustry Plant, Department of Systems and Informatics, Universidad de Caldas, Manizales, Colombia.
PLoS One. 2023 Sep 21;18(9):e0291925. doi: 10.1371/journal.pone.0291925. eCollection 2023.
Analysis of eukaryotic genomes requires the detection and classification of transposable elements (TEs), a crucial but complex and time-consuming task. To improve the performance of tools that accomplish these tasks, Machine Learning approaches (ML) that leverage computer resources, such as GPUs (Graphical Processing Unit) and multiple CPU (Central Processing Unit) cores, have been adopted. However, until now, the use of ML techniques has mostly been limited to classification of TEs. Herein, a detection-classification strategy (named YORO) based on convolutional neural networks is adapted from computer vision (YOLO) to genomics. This approach enables the detection of genomic objects through the prediction of the position, length, and classification in large DNA sequences such as fully sequenced genomes. As a proof of concept, the internal protein-coding domains of LTR-retrotransposons are used to train the proposed neural network. Precision, recall, accuracy, F1-score, execution times and time ratios, as well as several graphical representations were used as metrics to measure performance. These promising results open the door for a new generation of Deep Learning tools for genomics. YORO architecture is available at https://github.com/simonorozcoarias/YORO.
真核生物基因组分析需要检测和分类转座元件 (TEs),这是一项关键但复杂且耗时的任务。为了提高完成这些任务的工具的性能,已经采用了利用计算机资源(如 GPU 和多个 CPU 内核)的机器学习方法 (ML)。然而,到目前为止,ML 技术的使用主要限于 TEs 的分类。在此,从计算机视觉 (YOLO) 中采用了一种基于卷积神经网络的检测-分类策略 (命名为 YORO) 来应用于基因组学。该方法通过在大型 DNA 序列(如全序列基因组)中预测位置、长度和分类,实现了对基因组对象的检测。作为概念验证,使用 LTR 逆转录转座子的内部蛋白编码结构域来训练所提出的神经网络。精度、召回率、准确性、F1 分数、执行时间和时间比以及几个图形表示形式被用作衡量性能的指标。这些有希望的结果为新一代基因组学深度学习工具开辟了道路。YORO 架构可在 https://github.com/simonorozcoarias/YORO 上获得。