Suppr超能文献

评估模拟模型以模拟纳米孔测序仪和分割算法引入的扭曲。

Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms.

机构信息

Department of Electrical and Computer Engineering, University of Calgary, Calgary, Alberta, Canada.

Department of Radiology, University of Calgary, Calgary, Alberta, Canada.

出版信息

PLoS One. 2019 Jul 18;14(7):e0219495. doi: 10.1371/journal.pone.0219495. eCollection 2019.

Abstract

Nucleotides ratcheted through the biomolecular pores of nanopore sequencers generate raw picoamperage currents, which are segmented into step-current level signals representing the nucleotide sequence. These 'squiggles' are a noisy, distorted representation of the underlying true stepped current levels due to experimental and algorithmic factors. We were interested in developing a simulation model to support a white-box approach to identify common distortions, rather than relying on commonly used black box neural network techniques for basecalling nanopore signals. Dynamic time warped-space averaging (DTWA) techniques can generate a consensus from multiple noisy signals without introducing key feature distortions that occur with standard averaging. As a preprocessing tool, DTWA could provide cleaner and more accurate current signals for direct RNA or DNA analysis tools. However, DTWA approaches need modification to take advantage of the a-priori knowledge regarding a common, underlying gold-standard RNA / DNA sequence. Using experimental data, we derive a simulation model to provide known squiggle distortion signals to assist in validating the performance of analysis tools such as DTWA. Simulation models were evaluated by comparing mocked and experimental squiggle characteristics from one Enolase mRNA squiggle group produced by an Oxford MinION nanopore sequencer, and cross-validated using other Enolase, Sequin R1_71_1 and Sequin R2_55_3 mRNA studies. New techniques identified high inserted but low deleted base rates, generating consistent x1.7 squiggle event to base called ratios. Similar probability density and cumulative distribution functions, PDF and CDF, were found across all studies. Experimental PDFs were not the normal distributions expected if squiggle distortion arose from segmentation algorithm artefacts, or through individual nucleotides randomly interacting with individual nanopores. Matching experimental and mocked CDFs required the assumption that there are unique features associated with individual raw-current data streams. Z-normalized signal-to-noise ratios suggest intrinsic sensor limitations being responsible for half the gold standard and noisy squiggle DTW differences.

摘要

核苷酸通过纳米孔测序仪的生物分子孔棘轮产生原始皮安电流,这些电流被分割成代表核苷酸序列的阶跃电流水平信号。由于实验和算法因素,这些“扭结”是底层真实阶跃电流水平的嘈杂、扭曲表示。我们有兴趣开发一个模拟模型来支持一种白盒方法来识别常见的扭曲,而不是依赖于常用的黑盒神经网络技术来进行纳米孔信号的碱基调用。动态时间扭曲空间平均(DTWA)技术可以在不引入标准平均会产生的关键特征扭曲的情况下,从多个嘈杂信号中生成共识。作为预处理工具,DTWA 可以为直接的 RNA 或 DNA 分析工具提供更清洁、更准确的电流信号。然而,DTWA 方法需要进行修改,以利用关于常见潜在金标准 RNA/DNA 序列的先验知识。使用实验数据,我们推导出一个模拟模型,为分析工具(如 DTWA)提供已知的扭结失真信号,以协助验证其性能。通过比较由牛津 MinION 纳米孔测序仪产生的一个 Enolase mRNA 扭结组的模拟和实验扭结特征,并使用其他 Enolase、Sequin R1_71_1 和 Sequin R2_55_3 mRNA 研究进行交叉验证,对模拟模型进行了评估。新技术确定了高插入但低缺失碱基率,产生了一致的 x1.7 扭结事件到碱基调用比。在所有研究中都发现了相似的概率密度和累积分布函数(PDF 和 CDF)。如果扭结失真源于分段算法伪影或单个核苷酸与单个纳米孔随机相互作用,则实验 PDF 不应是预期的正态分布。实验和模拟 CDF 的匹配要求假设存在与单个原始电流数据流相关的独特特征。Z 归一化信号噪声比表明内在传感器限制负责金标准和嘈杂扭结 DTW 差异的一半。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验