• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用机器学习算法将二元蛋白质序列分类为高度可设计或低可设计。

Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable.

作者信息

Peto Myron, Kloczkowski Andrzej, Honavar Vasant, Jernigan Robert L

机构信息

Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011-3020, USA.

出版信息

BMC Bioinformatics. 2008 Nov 18;9:487. doi: 10.1186/1471-2105-9-487.

DOI:10.1186/1471-2105-9-487
PMID:19014713
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2655094/
Abstract

BACKGROUND

By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations.

RESULTS

First, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly- or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms.

CONCLUSION

By using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy -- in some cases exceeding 95%.

摘要

背景

通过使用带有序列最小优化(SMO)训练方法的标准支持向量机(SVM)、朴素贝叶斯和其他机器学习算法,我们能够区分两类蛋白质序列:那些折叠成高度可设计构象的序列,以及那些折叠成低可设计或不可设计构象的序列。

结果

首先,我们在二维三角形晶格上为指定形状(六边形或三角形)生成所有可能的紧凑晶格构象。然后我们生成所有可能的二元疏水/极性(H/P)序列,并使用指定的能量函数,将它们穿入所有这些紧凑构象中。如果对于给定序列,特定晶格构象获得最低能量,我们就假设该序列折叠成该构象。高度可设计的构象有许多H/P序列折叠到它们上面,而低可设计的构象则很少或没有H/P序列。我们将序列分类为折叠成高度可设计或低可设计构象。我们随机选择了属于高度可设计和低可设计构象的序列子集,并使用它们来训练几种不同的标准机器学习算法。

结论

通过使用这些具有十折交叉验证的机器学习算法,我们能够以高精度对这两类序列进行分类——在某些情况下超过95%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/5e91079b5667/1471-2105-9-487-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/b9f9d22a4211/1471-2105-9-487-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/1ff3841633c9/1471-2105-9-487-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/0056c96b873c/1471-2105-9-487-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/1b9db3f35cd2/1471-2105-9-487-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/f016285854a6/1471-2105-9-487-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/de8472519fa7/1471-2105-9-487-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/5e91079b5667/1471-2105-9-487-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/b9f9d22a4211/1471-2105-9-487-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/1ff3841633c9/1471-2105-9-487-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/0056c96b873c/1471-2105-9-487-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/1b9db3f35cd2/1471-2105-9-487-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/f016285854a6/1471-2105-9-487-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/de8472519fa7/1471-2105-9-487-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad7/2655094/5e91079b5667/1471-2105-9-487-7.jpg

相似文献

1
Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable.使用机器学习算法将二元蛋白质序列分类为高度可设计或低可设计。
BMC Bioinformatics. 2008 Nov 18;9:487. doi: 10.1186/1471-2105-9-487.
2
Exploration of the relationship between topology and designability of conformations.探索构象的拓扑结构与可设计性之间的关系。
J Chem Phys. 2011 Jun 21;134(23):235101. doi: 10.1063/1.3596947.
3
Predicting Designability of Small Proteins from Graph Features of Contact Maps.从接触图的图形特征预测小蛋白质的可设计性
J Comput Biol. 2016 May;23(5):400-11. doi: 10.1089/cmb.2015.0209.
4
Effect of training datasets on support vector machine prediction of protein-protein interactions.训练数据集对蛋白质-蛋白质相互作用支持向量机预测的影响。
Proteomics. 2005 Mar;5(4):876-84. doi: 10.1002/pmic.200401118.
5
Extended particle swarm optimisation method for folding protein on triangular lattice.用于在三角格上折叠蛋白质的扩展粒子群优化方法。
IET Syst Biol. 2016 Feb;10(1):30-3. doi: 10.1049/iet-syb.2015.0059.
6
Emergence of highly designable protein-backbone conformations in an off-lattice model.非晶格模型中高度可设计蛋白质主链构象的出现。
Proteins. 2002 Jun 1;47(4):506-12. doi: 10.1002/prot.10107.
7
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
8
A replica exchange Monte Carlo algorithm for protein folding in the HP model.用于HP模型中蛋白质折叠的复制交换蒙特卡罗算法。
BMC Bioinformatics. 2007 Sep 17;8:342. doi: 10.1186/1471-2105-8-342.
9
Geometric and statistical properties of the mean-field hydrophobic-polar model, the large-small model, and real protein sequences.平均场疏水-极性模型、大小模型及真实蛋白质序列的几何与统计特性
Phys Rev E Stat Nonlin Soft Matter Phys. 2002 Apr;65(4 Pt 1):041923. doi: 10.1103/PhysRevE.65.041923. Epub 2002 Apr 11.
10
Designable structures are easy to unfold.可设计的结构易于展开。
Phys Rev E Stat Nonlin Soft Matter Phys. 2006 Oct;74(4 Pt 1):042902. doi: 10.1103/PhysRevE.74.042902. Epub 2006 Oct 9.

引用本文的文献

1
Exploration of the relationship between topology and designability of conformations.探索构象的拓扑结构与可设计性之间的关系。
J Chem Phys. 2011 Jun 21;134(23):235101. doi: 10.1063/1.3596947.

本文引用的文献

1
An overview of statistical learning theory.统计学习理论概述。
IEEE Trans Neural Netw. 1999;10(5):988-99. doi: 10.1109/72.788640.
2
Shape-dependent designability studies of lattice proteins.晶格蛋白的形状依赖性可设计性研究
J Phys Condens Matter. 2007 Jul 18;19(28):285220-285230. doi: 10.1088/0953-8984/19/28/285220.
3
Generation and enumeration of compact conformations on the two-dimensional triangular and three-dimensional fcc lattices.二维三角形晶格和三维面心立方晶格上紧密构象的生成与计数。
J Chem Phys. 2007 Jul 28;127(4):044101. doi: 10.1063/1.2751169.
4
Positive and negative design in stability and thermal adaptation of natural proteins.天然蛋白质稳定性和热适应性中的正负设计
PLoS Comput Biol. 2007 Mar 23;3(3):e52. doi: 10.1371/journal.pcbi.0030052. Epub 2007 Feb 1.
5
Designable structures are easy to unfold.可设计的结构易于展开。
Phys Rev E Stat Nonlin Soft Matter Phys. 2006 Oct;74(4 Pt 1):042902. doi: 10.1103/PhysRevE.74.042902. Epub 2006 Oct 9.
6
Unbiased sampling of lattice Hamilton path ensembles.晶格哈密顿路径系综的无偏抽样。
J Chem Phys. 2006 Oct 21;125(15):154103. doi: 10.1063/1.2357935.
7
Physics and evolution of thermophilic adaptation.嗜热适应的物理学与进化
Proc Natl Acad Sci U S A. 2005 Sep 6;102(36):12742-7. doi: 10.1073/pnas.0503890102. Epub 2005 Aug 24.
8
Protein structure and evolutionary history determine sequence space topology.蛋白质结构和进化历史决定序列空间拓扑结构。
Genome Res. 2005 Mar;15(3):385-92. doi: 10.1101/gr.3133605.
9
Natural selection of more designable folds: a mechanism for thermophilic adaptation.更具可设计性折叠结构的自然选择:一种嗜热适应机制。
Proc Natl Acad Sci U S A. 2003 Jul 22;100(15):8727-31. doi: 10.1073/pnas.1530713100. Epub 2003 Jul 3.
10
Designability of protein structures: a lattice-model study using the Miyazawa-Jernigan matrix.蛋白质结构的可设计性:使用宫泽-杰尔尼根矩阵的晶格模型研究
Proteins. 2002 Nov 15;49(3):403-12. doi: 10.1002/prot.10239.