Suppr超能文献

使用监督式和非监督式机器学习方法对DNA拉伸测量的力谱进行分类

Classifying Force Spectroscopy of DNA Pulling Measurements Using Supervised and Unsupervised Machine Learning Methods.

作者信息

Karatay Durmus U, Zhang Jie, Harrison Jeffrey S, Ginger David S

机构信息

Department of Chemistry, University of Washington , Seattle, Washington 98195, United States.

出版信息

J Chem Inf Model. 2016 Apr 25;56(4):621-9. doi: 10.1021/acs.jcim.5b00722. Epub 2016 Apr 4.

Abstract

Dynamic force spectroscopy (DFS) measurements on biomolecules typically require classifying thousands of repeated force spectra prior to data analysis. Here, we study classification of atomic force microscope-based DFS measurements using machine-learning algorithms in order to automate selection of successful force curves. Notably, we collect a data set that has a testable positive signal using photoswitch-modified DNA before and after illumination with UV (365 nm) light. We generate a feature set consisting of six properties of force-distance curves to train supervised models and use principal component analysis (PCA) for an unsupervised model. For supervised classification, we train random forest models for binary and multiclass classification of force-distance curves. Random forest models predict successful pulls with an accuracy of 94% and classify them into five classes with an accuracy of 90%. The unsupervised method using Gaussian mixture models (GMM) reaches an accuracy of approximately 80% for binary classification.

摘要

对生物分子进行动态力谱(DFS)测量通常需要在数据分析之前对数千个重复的力谱进行分类。在此,我们使用机器学习算法研究基于原子力显微镜的DFS测量的分类,以便自动选择成功的力曲线。值得注意的是,我们收集了一个数据集,该数据集在紫外(365nm)光照射前后使用光开关修饰的DNA具有可测试的正信号。我们生成了一个由力-距离曲线的六个属性组成的特征集来训练监督模型,并使用主成分分析(PCA)来构建无监督模型。对于监督分类,我们训练随机森林模型用于力-距离曲线的二分类和多分类。随机森林模型预测成功拉伸的准确率为94%,并将其分为五类,准确率为90%。使用高斯混合模型(GMM)的无监督方法在二分类中达到了约80%的准确率。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验