Langouche Lennart, Aralar April, Sinha Mridu, Lawrence Shelley M, Fraley Stephanie I, Coleman Todd P
Department of Nanoengineering, University of California, San Diego, La Jolla, CA 92093, USA.
Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA.
Bioinformatics. 2021 Apr 1;36(22-23):5337-5343. doi: 10.1093/bioinformatics/btaa1053.
The need to rapidly screen complex samples for a wide range of nucleic acid targets, like infectious diseases, remains unmet. Digital High-Resolution Melt (dHRM) is an emerging technology with potential to meet this need by accomplishing broad-based, rapid nucleic acid sequence identification. Here, we set out to develop a computational framework for estimating the resolving power of dHRM technology for defined sequence profiling tasks. By deriving noise models from experimentally generated dHRM datasets and applying these to in silico predicted melt curves, we enable the production of synthetic dHRM datasets that faithfully recapitulate real-world variations arising from sample and machine variables. We then use these datasets to identify the most challenging melt curve classification tasks likely to arise for a given application and test the performance of benchmark classifiers.
This toolbox enables the in silico design and testing of broad-based dHRM screening assays and the selection of optimal classifiers. For an example application of screening common human bacterial pathogens, we show that human pathogens having the most similar sequences and melt curves are still reliably identifiable in the presence of experimental noise. Further, we find that ensemble methods outperform whole series classifiers for this task and are in some cases able to resolve melt curves with single-nucleotide resolution.
Data and code available on https://github.com/lenlan/dHRM-noise-modeling.
Supplementary data are available at Bioinformatics online.
快速筛查复杂样本中多种核酸靶标的需求,如针对传染病的筛查,仍未得到满足。数字高分辨率熔解(dHRM)是一项新兴技术,有潜力通过实现广泛、快速的核酸序列鉴定来满足这一需求。在此,我们着手开发一个计算框架,用于估计dHRM技术对特定序列分析任务的分辨能力。通过从实验生成的dHRM数据集中推导噪声模型,并将其应用于计算机模拟预测的熔解曲线,我们能够生成忠实再现由样本和机器变量引起的真实世界变化的合成dHRM数据集。然后,我们使用这些数据集来识别给定应用中可能出现的最具挑战性的熔解曲线分类任务,并测试基准分类器的性能。
这个工具箱能够进行基于计算机模拟的广泛dHRM筛选分析的设计和测试,以及选择最佳分类器。对于一个筛查常见人类细菌病原体的示例应用,我们表明,在存在实验噪声的情况下,具有最相似序列和熔解曲线的人类病原体仍能可靠地识别。此外,我们发现集成方法在这项任务上优于全序列分类器,并且在某些情况下能够以单核苷酸分辨率解析熔解曲线。
数据和代码可在https://github.com/lenlan/dHRM-noise-modeling上获取。
补充数据可在《生物信息学》在线版获取。