• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

运用C4.5数据挖掘算法对SWISS-PROT进行蛋白质注释的自动规则生成。

Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT.

作者信息

Kretschmann E, Fleischmann W, Apweiler R

机构信息

The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

出版信息

Bioinformatics. 2001 Oct;17(10):920-6. doi: 10.1093/bioinformatics/17.10.920.

DOI:10.1093/bioinformatics/17.10.920
PMID:11673236
Abstract

MOTIVATION

The gap between the amount of newly submitted protein data and reliable functional annotation in public databases is growing. Traditional manual annotation by literature curation and sequence analysis tools without the use of automated annotation systems is not able to keep up with the ever increasing quantity of data that is submitted. Automated supplements to manually curated databases such as TrEMBL or GenPept cover raw data but provide only limited annotation. To improve this situation automatic tools are needed that support manual annotation, automatically increase the amount of reliable information and help to detect inconsistencies in manually generated annotations.

RESULTS

A standard data mining algorithm was successfully applied to gain knowledge about the Keyword annotation in SWISS-PROT. 11 306 rules were generated, which are provided in a database and can be applied to yet unannotated protein sequences and viewed using a web browser. They rely on the taxonomy of the organism, in which the protein was found and on signature matches of its sequence. The statistical evaluation of the generated rules by cross-validation suggests that by applying them on arbitrary proteins 33% of their keyword annotation can be generated with an error rate of 1.5%. The coverage rate of the keyword annotation can be increased to 60% by tolerating a higher error rate of 5%.

AVAILABILITY

The results of the automatic data mining process can be browsed on http://golgi.ebi.ac.uk:8080/Spearmint/ Source code is available upon request.

CONTACT

kretsch@ebi.ac.uk.

摘要

动机

公共数据库中新增蛋白质数据量与可靠功能注释之间的差距正在扩大。传统的通过文献整理和序列分析工具进行的手动注释,不使用自动注释系统,已无法跟上不断增加的提交数据量。像TrEMBL或GenPept这样手动管理数据库的自动补充涵盖了原始数据,但仅提供有限的注释。为改善这种情况,需要自动工具来支持手动注释,自动增加可靠信息的数量,并有助于检测手动生成注释中的不一致之处。

结果

一种标准数据挖掘算法成功应用于获取关于SWISS-PROT中关键词注释的知识。生成了11306条规则,这些规则存储在一个数据库中,可应用于尚未注释的蛋白质序列,并可通过网络浏览器查看。它们依赖于发现蛋白质的生物体的分类学以及其序列的特征匹配。通过交叉验证对生成规则进行的统计评估表明,将它们应用于任意蛋白质时,可生成其33%的关键词注释,错误率为1.5%。通过容忍5%的更高错误率,关键词注释的覆盖率可提高到60%。

可用性

自动数据挖掘过程的结果可在http://golgi.ebi.ac.uk:8080/Spearmint/上浏览。源代码可根据要求提供。

联系方式

kretsch@ebi.ac.uk。

相似文献

1
Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT.运用C4.5数据挖掘算法对SWISS-PROT进行蛋白质注释的自动规则生成。
Bioinformatics. 2001 Oct;17(10):920-6. doi: 10.1093/bioinformatics/17.10.920.
2
A novel method for automatic functional annotation of proteins.一种蛋白质自动功能注释的新方法。
Bioinformatics. 1999 Mar;15(3):228-33. doi: 10.1093/bioinformatics/15.3.228.
3
Filtering erroneous protein annotation.过滤错误的蛋白质注释。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i342-7. doi: 10.1093/bioinformatics/bth938.
4
Mining sequence annotation databanks for association patterns.挖掘序列注释数据库中的关联模式。
Bioinformatics. 2005 Nov 1;21 Suppl 3:iii49-57. doi: 10.1093/bioinformatics/bti1206.
5
UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View.UniProtKB/Swiss-Prot,即UniProt知识库的人工注释部分:如何使用条目视图。
Methods Mol Biol. 2016;1374:23-54. doi: 10.1007/978-1-4939-3167-5_2.
6
Adaptive algorithm of automated annotation.自动标注的自适应算法
Bioinformatics. 2002 Jun;18(6):838-44. doi: 10.1093/bioinformatics/18.6.838.
7
Applying negative rule mining to improve genome annotation.应用负规则挖掘来改进基因组注释。
BMC Bioinformatics. 2007 Jul 21;8:261. doi: 10.1186/1471-2105-8-261.
8
The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.基因本体注释(GOA)数据库:在UniProt中与基因本体共享知识。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D262-6. doi: 10.1093/nar/gkh021.
9
An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.对生物创意(BioCreAtIvE)和基因本体注释(GOA)的基因本体(GO)注释检索的评估。
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24.
10
Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL.基因组时代的蛋白质序列注释:SWISS-PROT+TREMBL注释概念
Proc Int Conf Intell Syst Mol Biol. 1997;5:33-43.

引用本文的文献

1
A chromosome-level genome assembly of the aphid Semiaphis heraclei (Takahashi).蚜虫牛蒡蚜(高桥)的染色体水平基因组组装。
Sci Data. 2025 May 10;12(1):770. doi: 10.1038/s41597-025-04994-x.
2
Chromosome-level genome assembly of the cereal cyst nematode Heterodera flipjevi.禾谷孢囊线虫染色体水平基因组组装。
Sci Data. 2024 Jun 17;11(1):637. doi: 10.1038/s41597-024-03487-7.
3
Improving automatic GO annotation with semantic similarity.利用语义相似度提高 GO 自动注释的效果。
BMC Bioinformatics. 2022 Dec 12;23(Suppl 2):433. doi: 10.1186/s12859-022-04958-7.
4
Engineering and screening of novel β-1,3-xylanases with desired hydrolysate type by optimized ancestor sequence reconstruction and data mining.通过优化祖先序列重建和数据挖掘对具有所需水解产物类型的新型β-1,3-木聚糖酶进行工程改造和筛选。
Comput Struct Biotechnol J. 2022 Jun 27;20:3313-3321. doi: 10.1016/j.csbj.2022.06.050. eCollection 2022.
5
GrAPFI: predicting enzymatic function of proteins from domain similarity graphs.GrAPFI:基于结构域相似性图预测蛋白质的酶功能。
BMC Bioinformatics. 2020 Apr 29;21(1):168. doi: 10.1186/s12859-020-3460-7.
6
Applying Data Mining Techniques for Predicting Prognosis in Patients with Rheumatoid Arthritis.应用数据挖掘技术预测类风湿关节炎患者的预后
Healthcare (Basel). 2020 Apr 3;8(2):85. doi: 10.3390/healthcare8020085.
7
Improvement of Adequate Digoxin Dosage: An Application of Machine Learning Approach.优化地高辛剂量:机器学习方法的应用。
J Healthc Eng. 2018 Aug 19;2018:3948245. doi: 10.1155/2018/3948245. eCollection 2018.
8
Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining.通过关联规则挖掘预测原核生物UniProtKB数据中的代谢途径参与情况
PLoS One. 2016 Jul 8;11(7):e0158896. doi: 10.1371/journal.pone.0158896. eCollection 2016.
9
Enzyme reaction annotation using cloud techniques.利用云技术进行酶反应注释。
Biomed Res Int. 2013;2013:140237. doi: 10.1155/2013/140237. Epub 2013 Sep 26.
10
HAMAP in 2013, new developments in the protein family classification and annotation system.HAMAP 于 2013 年,蛋白质家族分类和注释系统的新发展。
Nucleic Acids Res. 2013 Jan;41(Database issue):D584-9. doi: 10.1093/nar/gks1157. Epub 2012 Nov 27.