• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种通过交叉训练在蛋白质结构与功能分类法之间进行概率映射的方法。

A method for probabilistic mapping between protein structure and function taxonomies through cross training.

作者信息

Gupta Kshitiz, Sehgal Vivek, Levchenko Andre

机构信息

The Whitaker Institute for Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD, USA.

出版信息

BMC Struct Biol. 2008 Oct 3;8:40. doi: 10.1186/1472-6807-8-40.

DOI:10.1186/1472-6807-8-40
PMID:18834528
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2573881/
Abstract

BACKGROUND

Prediction of function of proteins on the basis of structure and vice versa is a partially solved problem, largely in the domain of biophysics and biochemistry. This underlies the need of computational and bioinformatics approach to solve the problem. Large and organized latent knowledge on protein classification exists in the form of independently created protein classification databases. By creating probabilistic maps between classes of structural classification databases (e.g. SCOP) and classes of functional classification databases (e.g. PROSITE), structure and function of proteins could be probabilistically related.

RESULTS

We demonstrate that PROSITE and SCOP have significant semantic overlap, in spite of independent classification schemes. By training classifiers of SCOP using classes of PROSITE as attributes and vice versa, accuracy of Support Vector Machine classifiers for both SCOP and PROSITE was improved. Novel attributes, 2-D elastic profiles and Blocks were used to improve time complexity and accuracy. Many relationships were extracted between classes of SCOP and PROSITE using decision trees.

CONCLUSION

We demonstrate that presented approach can discover new probabilistic relationships between classes of different taxonomies and render a more accurate classification. Extensive mappings between existing protein classification databases can be created to link the large amount of organized data. Probabilistic maps were created between classes of SCOP and PROSITE allowing predictions of structure using function, and vice versa. In our experiments, we also found that functions are indeed more strongly related to structure than are structure to functions.

摘要

背景

基于结构预测蛋白质功能以及反之亦然,这是一个部分得到解决的问题,主要属于生物物理学和生物化学领域。这就产生了采用计算和生物信息学方法来解决该问题的需求。关于蛋白质分类的大量且有组织的潜在知识以独立创建的蛋白质分类数据库的形式存在。通过在结构分类数据库(如SCOP)的类别与功能分类数据库(如PROSITE)的类别之间创建概率图谱,蛋白质的结构和功能可以建立概率关联。

结果

我们证明,尽管PROSITE和SCOP采用了独立的分类方案,但它们存在显著的语义重叠。通过使用PROSITE的类别作为属性来训练SCOP的分类器,反之亦然,SCOP和PROSITE的支持向量机分类器的准确性都得到了提高。使用新颖的属性,二维弹性轮廓和模块来提高时间复杂度和准确性。利用决策树提取了SCOP和PROSITE类别之间的许多关系。

结论

我们证明所提出的方法可以发现不同分类法类别之间新的概率关系,并实现更准确的分类。可以在现有的蛋白质分类数据库之间创建广泛的映射,以链接大量有组织的数据。在SCOP和PROSITE的类别之间创建了概率图谱,从而允许利用功能预测结构,反之亦然。在我们的实验中,我们还发现功能与结构的关联确实比结构与功能的关联更为紧密。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2032/2573881/5b035bba15ff/1472-6807-8-40-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2032/2573881/55707c03a17f/1472-6807-8-40-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2032/2573881/5b035bba15ff/1472-6807-8-40-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2032/2573881/55707c03a17f/1472-6807-8-40-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2032/2573881/5b035bba15ff/1472-6807-8-40-2.jpg

相似文献

1
A method for probabilistic mapping between protein structure and function taxonomies through cross training.一种通过交叉训练在蛋白质结构与功能分类法之间进行概率映射的方法。
BMC Struct Biol. 2008 Oct 3;8:40. doi: 10.1186/1472-6807-8-40.
2
Automatic classification of protein structures using low-dimensional structure space mappings.利用低维结构空间映射对蛋白质结构进行自动分类。
BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-15-S2-S1. Epub 2014 Jan 24.
3
Incorporating functional inter-relationships into protein function prediction algorithms.将功能相互关系纳入蛋白质功能预测算法。
BMC Bioinformatics. 2009 May 12;10:142. doi: 10.1186/1471-2105-10-142.
4
Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.评估基因组学中的注释转移:通过传统分数和概率分数量化蛋白质序列、结构与功能之间的关系。
J Mol Biol. 2000 Mar 17;297(1):233-49. doi: 10.1006/jmbi.2000.3550.
5
AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings.AutoSCOP:使用独特的模式-类别映射自动预测SCOP分类
Bioinformatics. 2007 May 15;23(10):1203-10. doi: 10.1093/bioinformatics/btm089. Epub 2007 Mar 22.
6
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
7
A fast SCOP fold classification system using content-based E-Predict algorithm.一种使用基于内容的E-Predict算法的快速SCOP折叠分类系统。
BMC Bioinformatics. 2006 Jul 26;7:362. doi: 10.1186/1471-2105-7-362.
8
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
9
SCOP database in 2004: refinements integrate structure and sequence family data.2004年的SCOP数据库:改进整合了结构和序列家族数据。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D226-9. doi: 10.1093/nar/gkh039.
10
PROSITE: a documented database using patterns and profiles as motif descriptors.PROSITE:一个使用模式和轮廓作为基序描述符的文献数据库。
Brief Bioinform. 2002 Sep;3(3):265-74. doi: 10.1093/bib/3.3.265.

引用本文的文献

1
From Sequence to Solution: Intelligent Learning Engine Optimization in Drug Discovery and Protein Analysis.从序列到解决方案:药物发现与蛋白质分析中的智能学习引擎优化
BioTech (Basel). 2024 Sep 1;13(3):33. doi: 10.3390/biotech13030033.
2
Cross-topic learning for work prioritization in systematic review creation and update.跨主题学习在系统综述创建和更新中的工作优先级排序。
J Am Med Inform Assoc. 2009 Sep-Oct;16(5):690-704. doi: 10.1197/jamia.M3162. Epub 2009 Jun 30.

本文引用的文献

1
A dynamic Bayesian network approach to protein secondary structure prediction.一种用于蛋白质二级结构预测的动态贝叶斯网络方法。
BMC Bioinformatics. 2008 Jan 25;9:49. doi: 10.1186/1471-2105-9-49.
2
CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures.大教堂:一种从多结构域蛋白质结构预测折叠和结构域边界的快速有效算法。
PLoS Comput Biol. 2007 Nov;3(11):e232. doi: 10.1371/journal.pcbi.0030232.
3
The 20 years of PROSITE.PROSITE的二十年。
Nucleic Acids Res. 2008 Jan;36(Database issue):D245-9. doi: 10.1093/nar/gkm977. Epub 2007 Nov 14.
4
Data growth and its impact on the SCOP database: new developments.数据增长及其对SCOP数据库的影响:新进展
Nucleic Acids Res. 2008 Jan;36(Database issue):D419-25. doi: 10.1093/nar/gkm993. Epub 2007 Nov 13.
5
Accuracy of structure-based sequence alignment of automatic methods.自动方法的基于结构的序列比对准确性。
BMC Bioinformatics. 2007 Sep 20;8:355. doi: 10.1186/1471-2105-8-355.
6
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
7
Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition.利用多尺度能量和伪氨基酸组成的支持向量机预测蛋白质亚细胞定位
Amino Acids. 2007 Jul;33(1):69-74. doi: 10.1007/s00726-006-0475-y. Epub 2007 Jan 19.
8
A structural alignment kernel for protein structures.一种用于蛋白质结构的结构比对核。
Bioinformatics. 2007 May 1;23(9):1090-8. doi: 10.1093/bioinformatics/btl642. Epub 2007 Jan 18.
9
A comprehensive dictionary of protein accession codes for complete protein accession identifier alias resolving.一本全面的蛋白质登录号字典,用于完整蛋白质登录标识符别名解析。
Proteomics. 2006 Aug;6(15):4223-6. doi: 10.1002/pmic.200600018.
10
A fast SCOP fold classification system using content-based E-Predict algorithm.一种使用基于内容的E-Predict算法的快速SCOP折叠分类系统。
BMC Bioinformatics. 2006 Jul 26;7:362. doi: 10.1186/1471-2105-7-362.