Suppr超能文献

布朗运动数据增强:一种提升神经网络在纳米孔传感器上性能的方法。

Brownian motion data augmentation: a method to push neural network performance on nanopore sensors.

作者信息

Kipen Javier, Jaldén Joakim

机构信息

Division of Information Science and Engineering, Kungliga Tekniska Högskolan, Stockholm, 114 28, Sweden.

出版信息

Bioinformatics. 2025 May 29;41(6). doi: 10.1093/bioinformatics/btaf323.

Abstract

MOTIVATION

Nanopores are highly sensitive sensors that have achieved commercial success in DNA/RNA sequencing, with potential applications in protein sequencing and biomarker identification. Solid-state nanopores, in particular, face challenges such as instability and low signal-to-noise ratios (SNRs), which lead scientists to adopt data-driven methods for nanopore signal analysis, although data acquisition remains restrictive.

RESULTS

We address this data scarcity by augmenting the training samples with traces that emulate Brownian motion effects, based on dynamic models in the literature. We apply this method to a publicly available dataset of a classification task containing nanopore reads of DNA with encoded barcodes. A neural network named QuipuNet was previously published for this dataset, and we demonstrate that our augmentation method produces a noticeable increase in QuipuNet's accuracy. Furthermore, we introduce a novel neural network named YupanaNet, which achieves greater accuracy (95.8%) than QuipuNet (94.6%) on the same dataset. YupanaNet benefits from both the enhanced generalization provided by Brownian motion data augmentation and the incorporation of novel architectures, including skip connections and a soft attention mask.

AVAILABILITY AND IMPLEMENTATION

The source code and data are available at: https://github.com/JavierKipen/browDataAug.

SUPPLEMENTARY INFORMATION

Supplementary information is available at Bioinformatics online.

摘要

动机

纳米孔是高度灵敏的传感器,已在DNA/RNA测序中取得商业成功,在蛋白质测序和生物标志物识别方面具有潜在应用。特别是固态纳米孔面临诸如稳定性和低信噪比(SNR)等挑战,这促使科学家采用数据驱动方法进行纳米孔信号分析,尽管数据采集仍然受限。

结果

我们基于文献中的动态模型,通过用模拟布朗运动效应的轨迹扩充训练样本解决了数据稀缺问题。我们将此方法应用于一个公开可用的包含带有编码条形码的DNA纳米孔读数的分类任务数据集。此前已针对该数据集发布了一个名为QuipuNet的神经网络,我们证明我们的扩充方法使QuipuNet的准确率显著提高。此外,我们引入了一种名为YupanaNet的新型神经网络,在同一数据集上它比QuipuNet(94.6%)实现了更高的准确率(95.8%)。YupanaNet受益于布朗运动数据扩充提供的增强泛化能力以及包括跳跃连接和软注意力掩码在内的新型架构的引入。

可用性和实现方式

源代码和数据可在以下网址获取:https://github.com/JavierKipen/browDataAug。

补充信息

补充信息可在《生物信息学》在线版获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验