College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing 211106, China.
Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China.
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae083.
Nanopore sequencing is a new macromolecular recognition and perception technology that enables high-throughput sequencing of DNA, RNA, even protein molecules. The sequences generated by nanopore sequencing span a large time frame, and the labor and time costs incurred by traditional analysis methods are substantial. Recently, research on nanopore data analysis using machine learning algorithms has gained unceasing momentum, but there is often a significant gap between traditional and deep learning methods in terms of classification results. To analyze nanopore data using deep learning technologies, measures such as sequence completion and sequence transformation can be employed. However, these technologies do not preserve the local features of the sequences. To address this issue, we propose a sequence-to-image (S2I) module that transforms sequences of unequal length into images. Additionally, we propose the Transformer-based T-S2Inet model to capture the important information and improve the classification accuracy.
Quantitative and qualitative analysis shows that the experimental results have an improvement of around 2% in accuracy compared to previous methods. The proposed method is adaptable to other nanopore platforms, such as the Oxford nanopore. It is worth noting that the proposed method not only aims to achieve the most advanced performance, but also provides a general idea for the analysis of nanopore sequences of unequal length.
The main program is available at https://github.com/guanxiaoyu11/S2Inet.
纳米孔测序是一种新的大分子识别和感知技术,能够实现 DNA、RNA,甚至蛋白质分子的高通量测序。纳米孔测序生成的序列跨越了很大的时间框架,传统分析方法所产生的劳动力和时间成本相当高。最近,使用机器学习算法对纳米孔数据分析的研究不断取得进展,但传统方法和深度学习方法在分类结果方面往往存在显著差距。为了使用深度学习技术分析纳米孔数据,可以采用序列补全和序列转换等措施。然而,这些技术无法保留序列的局部特征。为了解决这个问题,我们提出了一种序列到图像(S2I)模块,将不等长的序列转换为图像。此外,我们还提出了基于 Transformer 的 T-S2Inet 模型,以捕获重要信息并提高分类准确性。
定量和定性分析表明,与以前的方法相比,实验结果在准确性方面提高了约 2%。该方法适用于其他纳米孔平台,如牛津纳米孔。值得注意的是,该方法不仅旨在实现最先进的性能,还为分析不等长的纳米孔序列提供了一个通用思路。
主程序可在 https://github.com/guanxiaoyu11/S2Inet 上获得。