T-S2Inet：基于 Transformer 的序列到图像网络，用于准确的纳米孔序列识别。

T-S2Inet: Transformer-based sequence-to-image network for accurate nanopore sequence recognition.

机构信息

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing 211106, China.

Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China.

出版信息

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae083.

DOI:10.1093/bioinformatics/btae083

PMID:38366607

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10902682/

Abstract

MOTIVATION

Nanopore sequencing is a new macromolecular recognition and perception technology that enables high-throughput sequencing of DNA, RNA, even protein molecules. The sequences generated by nanopore sequencing span a large time frame, and the labor and time costs incurred by traditional analysis methods are substantial. Recently, research on nanopore data analysis using machine learning algorithms has gained unceasing momentum, but there is often a significant gap between traditional and deep learning methods in terms of classification results. To analyze nanopore data using deep learning technologies, measures such as sequence completion and sequence transformation can be employed. However, these technologies do not preserve the local features of the sequences. To address this issue, we propose a sequence-to-image (S2I) module that transforms sequences of unequal length into images. Additionally, we propose the Transformer-based T-S2Inet model to capture the important information and improve the classification accuracy.

RESULTS

Quantitative and qualitative analysis shows that the experimental results have an improvement of around 2% in accuracy compared to previous methods. The proposed method is adaptable to other nanopore platforms, such as the Oxford nanopore. It is worth noting that the proposed method not only aims to achieve the most advanced performance, but also provides a general idea for the analysis of nanopore sequences of unequal length.

AVAILABILITY AND IMPLEMENTATION

The main program is available at https://github.com/guanxiaoyu11/S2Inet.

摘要

动机

纳米孔测序是一种新的大分子识别和感知技术，能够实现 DNA、RNA，甚至蛋白质分子的高通量测序。纳米孔测序生成的序列跨越了很大的时间框架，传统分析方法所产生的劳动力和时间成本相当高。最近，使用机器学习算法对纳米孔数据分析的研究不断取得进展，但传统方法和深度学习方法在分类结果方面往往存在显著差距。为了使用深度学习技术分析纳米孔数据，可以采用序列补全和序列转换等措施。然而，这些技术无法保留序列的局部特征。为了解决这个问题，我们提出了一种序列到图像（S2I）模块，将不等长的序列转换为图像。此外，我们还提出了基于 Transformer 的 T-S2Inet 模型，以捕获重要信息并提高分类准确性。