Suppr超能文献

T-S2Inet:基于 Transformer 的序列到图像网络,用于准确的纳米孔序列识别。

T-S2Inet: Transformer-based sequence-to-image network for accurate nanopore sequence recognition.

机构信息

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing 211106, China.

Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China.

出版信息

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae083.

Abstract

MOTIVATION

Nanopore sequencing is a new macromolecular recognition and perception technology that enables high-throughput sequencing of DNA, RNA, even protein molecules. The sequences generated by nanopore sequencing span a large time frame, and the labor and time costs incurred by traditional analysis methods are substantial. Recently, research on nanopore data analysis using machine learning algorithms has gained unceasing momentum, but there is often a significant gap between traditional and deep learning methods in terms of classification results. To analyze nanopore data using deep learning technologies, measures such as sequence completion and sequence transformation can be employed. However, these technologies do not preserve the local features of the sequences. To address this issue, we propose a sequence-to-image (S2I) module that transforms sequences of unequal length into images. Additionally, we propose the Transformer-based T-S2Inet model to capture the important information and improve the classification accuracy.

RESULTS

Quantitative and qualitative analysis shows that the experimental results have an improvement of around 2% in accuracy compared to previous methods. The proposed method is adaptable to other nanopore platforms, such as the Oxford nanopore. It is worth noting that the proposed method not only aims to achieve the most advanced performance, but also provides a general idea for the analysis of nanopore sequences of unequal length.

AVAILABILITY AND IMPLEMENTATION

The main program is available at https://github.com/guanxiaoyu11/S2Inet.

摘要

动机

纳米孔测序是一种新的大分子识别和感知技术,能够实现 DNA、RNA,甚至蛋白质分子的高通量测序。纳米孔测序生成的序列跨越了很大的时间框架,传统分析方法所产生的劳动力和时间成本相当高。最近,使用机器学习算法对纳米孔数据分析的研究不断取得进展,但传统方法和深度学习方法在分类结果方面往往存在显著差距。为了使用深度学习技术分析纳米孔数据,可以采用序列补全和序列转换等措施。然而,这些技术无法保留序列的局部特征。为了解决这个问题,我们提出了一种序列到图像(S2I)模块,将不等长的序列转换为图像。此外,我们还提出了基于 Transformer 的 T-S2Inet 模型,以捕获重要信息并提高分类准确性。

结果

定量和定性分析表明,与以前的方法相比,实验结果在准确性方面提高了约 2%。该方法适用于其他纳米孔平台,如牛津纳米孔。值得注意的是,该方法不仅旨在实现最先进的性能,还为分析不等长的纳米孔序列提供了一个通用思路。

可用性和实现

主程序可在 https://github.com/guanxiaoyu11/S2Inet 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680b/10902682/45f109b96d4e/btae083f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验