• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用机器学习技术挖掘蛋白质数据库。

Mining protein database using machine learning techniques.

作者信息

Camargo Renata da Silva, Niranjan Mahesan

机构信息

Department of Computer Science, The University of Sheffield, Regent Court, Sheffield, UK.

出版信息

J Integr Bioinform. 2008 Aug 25;5(2):106. doi: 10.2390/biecoll-jib-2008-106.

DOI:10.2390/biecoll-jib-2008-106
PMID:20134071
Abstract

With a large amount of information relating to proteins accumulating in databases widely available online, it is of interest to apply machine learning techniques that, by extracting underlying statistical regularities in the data, make predictions about the functional and evolutionary characteristics of unseen proteins. Such predictions can help in achieving a reduction in the space over which experiment designers need to search in order to improve our understanding of the biochemical properties. Previously it has been suggested that an integration of features computable by comparing a pair of proteins can be achieved by an artificial neural network, hence predicting the degree to which they may be evolutionary related and homologous.
We compiled two datasets of pairs of proteins, each pair being characterised by seven distinct features. We performed an exhaustive search through all possible combinations of features, for the problem of separating remote homologous from analogous pairs, we note that significant performance gain was obtained by the inclusion of sequence and structure information. We find that the use of a linear classifier was enough to discriminate a protein pair at the family level. However, at the superfamily level, to detect remote homologous pairs was a relatively harder problem. We find that the use of nonlinear classifiers achieve significantly higher accuracies.
In this paper, we compare three different pattern classification methods on two problems formulated as detecting evolutionary and functional relationships between pairs of proteins, and from extensive cross validation and feature selection based studies quantify the average limits and uncertainties with which such predictions may be made. Feature selection points to a "knowledge gap" in currently available functional annotations. We demonstrate how the scheme may be employed in a framework to associate an individual protein with an existing family of evolutionarily related proteins.

摘要

随着大量与蛋白质相关的信息在广泛可用的在线数据库中不断积累,应用机器学习技术变得很有意义。这些技术通过提取数据中潜在的统计规律,对未知蛋白质的功能和进化特征进行预测。这样的预测有助于缩小实验设计者为增进我们对生化特性的理解而需要搜索的范围。此前有人提出,通过人工神经网络可以实现对一对蛋白质进行比较时可计算的特征整合,从而预测它们在进化上的相关程度和同源性。

我们编制了两个蛋白质对数据集,每对蛋白质由七个不同特征来表征。对于区分远缘同源对和类似对的问题,我们对所有可能的特征组合进行了详尽搜索,注意到通过纳入序列和结构信息可显著提高性能。我们发现使用线性分类器足以在家族水平上区分蛋白质对。然而,在超家族水平上,检测远缘同源对是一个相对更难的问题。我们发现使用非线性分类器能显著提高准确率。

在本文中,我们针对两个关于检测蛋白质对之间进化和功能关系的问题,比较了三种不同的模式分类方法,并通过广泛的交叉验证和基于特征选择的研究,量化了进行此类预测时可能存在的平均极限和不确定性。特征选择指出了当前可用功能注释中的“知识空白”。我们展示了该方案如何在一个框架中用于将单个蛋白质与现有的进化相关蛋白质家族相关联。

相似文献

1
Mining protein database using machine learning techniques.使用机器学习技术挖掘蛋白质数据库。
J Integr Bioinform. 2008 Aug 25;5(2):106. doi: 10.2390/biecoll-jib-2008-106.
2
Global sequence properties for superfamily prediction: a machine learning approach.用于超家族预测的全局序列特性:一种机器学习方法。
J Integr Bioinform. 2009 Aug 23;6(1):109. doi: 10.2390/biecoll-jib-2009-109.
3
A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.一种基于机器学习的方法,利用氨基酸组成、顺序和相似性搜索来预测分泌蛋白。
In Silico Biol. 2008;8(2):129-40.
4
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学,使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应
Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.
5
Accurate prediction of solvent accessibility using neural networks-based regression.使用基于神经网络的回归准确预测溶剂可及性。
Proteins. 2004 Sep 1;56(4):753-67. doi: 10.1002/prot.20176.
6
Prediction of protein subcellular localization.蛋白质亚细胞定位预测
Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.
7
Large-scale learning of structure-activity relationships using a linear support vector machine and problem-specific metrics.使用线性支持向量机和特定于问题的指标进行大规模结构-活性关系学习。
J Chem Inf Model. 2011 Feb 28;51(2):203-13. doi: 10.1021/ci100073w. Epub 2011 Jan 5.
8
Protein classification based on text document classification techniques.基于文本文档分类技术的蛋白质分类。
Proteins. 2005 Mar 1;58(4):955-70. doi: 10.1002/prot.20373.
9
Kernel methods for predicting protein-protein interactions.用于预测蛋白质-蛋白质相互作用的核方法。
Bioinformatics. 2005 Jun;21 Suppl 1:i38-46. doi: 10.1093/bioinformatics/bti1016.
10
AVID: an integrative framework for discovering functional relationships among proteins.AVID:一个用于发现蛋白质间功能关系的综合框架。
BMC Bioinformatics. 2005 Jun 1;6:136. doi: 10.1186/1471-2105-6-136.