• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于超家族预测的全局序列特性:一种机器学习方法。

Global sequence properties for superfamily prediction: a machine learning approach.

作者信息

Dobson Richard J B, Munroe Patricia B, Caulfield Mark J, Saqi Mansoor A S

机构信息

The William Harvey Research Institute, Bart's and the London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK.

出版信息

J Integr Bioinform. 2009 Aug 23;6(1):109. doi: 10.2390/biecoll-jib-2009-109.

DOI:10.2390/biecoll-jib-2009-109
PMID:20134076
Abstract

Functional annotation of a protein sequence in the absence of experimental data or clear similarity to a sequence of known function is difficult. In this study, a simple set of sequence attributes based on physicochemical and predicted structural characteristics were used as input to machine learning methods. In order to improve performance through increasing the data available for training, a technique of sequence enrichment was explored. These methods were used to predict membership to 24 and 49 large and diverse protein superfamiles from the SCOP database. We found the best performance was obtained using an enriched training dataset. Accuracies of 66.3% and 55.6% were achieved on datasets comprising 24 and 49 superfamilies with LibSVM and AdaBoostM1 respectively. The methods used here confirm that domains within superfamilies share global sequence properties. We show machine learning models used to predict categories within the SCOP database can be significantly improved via a simple sequence enrichment step. These approaches can be used to complement profile methods for detecting distant relationships where function is difficult to infer.

摘要

在缺乏实验数据或与已知功能序列无明显相似性的情况下,对蛋白质序列进行功能注释是困难的。在本研究中,基于物理化学和预测结构特征的一组简单序列属性被用作机器学习方法的输入。为了通过增加可用于训练的数据来提高性能,探索了一种序列富集技术。这些方法被用于预测来自SCOP数据库的24个和49个大型且多样的蛋白质超家族的成员资格。我们发现使用富集训练数据集可获得最佳性能。使用LibSVM和AdaBoostM1分别在包含24个和49个超家族的数据集上实现了66.3%和55.6%的准确率。这里使用的方法证实超家族内的结构域共享全局序列属性。我们表明,用于预测SCOP数据库内类别的机器学习模型可通过简单的序列富集步骤得到显著改进。这些方法可用于补充轮廓方法,以检测难以推断功能的远距离关系。

相似文献

1
Global sequence properties for superfamily prediction: a machine learning approach.用于超家族预测的全局序列特性:一种机器学习方法。
J Integr Bioinform. 2009 Aug 23;6(1):109. doi: 10.2390/biecoll-jib-2009-109.
2
AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings.AutoSCOP:使用独特的模式-类别映射自动预测SCOP分类
Bioinformatics. 2007 May 15;23(10):1203-10. doi: 10.1093/bioinformatics/btm089. Epub 2007 Mar 22.
3
Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity.机器学习方法在预测与序列相似性无关的蛋白质功能类别应用中的最新进展。
Proteomics. 2006 Jul;6(14):4023-37. doi: 10.1002/pmic.200500938.
4
A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.一种基于机器学习的方法,利用氨基酸组成、顺序和相似性搜索来预测分泌蛋白。
In Silico Biol. 2008;8(2):129-40.
5
A Protein Classification Benchmark collection for machine learning.一个用于机器学习的蛋白质分类基准数据集。
Nucleic Acids Res. 2007 Jan;35(Database issue):D232-6. doi: 10.1093/nar/gkl812. Epub 2006 Nov 16.
6
Protein remote homology detection based on auto-cross covariance transformation.基于自交协方差变换的蛋白质远程同源检测。
Comput Biol Med. 2011 Aug;41(8):640-7. doi: 10.1016/j.compbiomed.2011.05.015. Epub 2011 Jun 12.
7
Remote protein homology detection and fold recognition using two-layer support vector machine classifiers.使用两层支持向量机分类器进行远程蛋白质同源检测和折叠识别。
Comput Biol Med. 2011 Aug;41(8):687-99. doi: 10.1016/j.compbiomed.2011.06.004. Epub 2011 Jun 25.
8
Inferring boundary information of discontinuous-domain proteins.推断不连续结构域蛋白质的边界信息。
IEEE Trans Nanobioscience. 2008 Sep;7(3):200-5. doi: 10.1109/TNB.2008.2002283.
9
Mining protein database using machine learning techniques.使用机器学习技术挖掘蛋白质数据库。
J Integr Bioinform. 2008 Aug 25;5(2):106. doi: 10.2390/biecoll-jib-2008-106.
10
Automated discovery of 3D motifs for protein function annotation.用于蛋白质功能注释的3D基序的自动发现。
Bioinformatics. 2006 Mar 15;22(6):723-30. doi: 10.1093/bioinformatics/btk038. Epub 2006 Jan 12.

引用本文的文献

1
Protein domain recurrence and order can enhance prediction of protein functions.蛋白质结构域的重复和顺序可以增强对蛋白质功能的预测。
Bioinformatics. 2012 Sep 15;28(18):i444-i450. doi: 10.1093/bioinformatics/bts398.