• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因本体注释的自动提取及其与蛋白质网络中聚类的相关性。

Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks.

作者信息

Daraselia Nikolai, Yuryev Anton, Egorov Sergei, Mazo Ilya, Ispolatov Iaroslav

机构信息

Ariadne Genomics, Inc, Rockville, MD 20850, USA.

出版信息

BMC Bioinformatics. 2007 Jul 10;8:243. doi: 10.1186/1471-2105-8-243.

DOI:10.1186/1471-2105-8-243
PMID:17620146
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1940026/
Abstract

BACKGROUND

Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets.

RESULTS

We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP) technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO) annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology. An increase in the number and size of GO groups without any noticeable decrease of the link density within the groups indicated that this expansion significantly broadens the public GO annotation without diluting its quality. We revealed that functional GO annotation correlates mostly with clustering in a physical interaction protein network, while its overlap with indirect regulatory network communities is two to three times smaller.

CONCLUSION

Protein functional annotations extracted by the NLP technology expand and enrich the existing GO annotation system. The GO functional modularity correlates mostly with the clustering in the physical interaction network, suggesting that the essential role of structural organization maintained by these interactions. Reciprocally, clustering of proteins in physical interaction networks can serve as an evidence for their functional similarity.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/a1c210127b6a/1471-2105-8-243-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/30862a2a1319/1471-2105-8-243-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/3afe0cafe9b4/1471-2105-8-243-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/514646f33a27/1471-2105-8-243-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/5f22974a0793/1471-2105-8-243-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/451dab07600f/1471-2105-8-243-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/54a387444bf3/1471-2105-8-243-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/a1c210127b6a/1471-2105-8-243-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/30862a2a1319/1471-2105-8-243-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/3afe0cafe9b4/1471-2105-8-243-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/514646f33a27/1471-2105-8-243-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/5f22974a0793/1471-2105-8-243-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/451dab07600f/1471-2105-8-243-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/54a387444bf3/1471-2105-8-243-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0b8/1940026/a1c210127b6a/1471-2105-8-243-7.jpg
摘要

背景

揭示一种蛋白质的细胞功能是一项极为重要且复杂的任务,需要专门的实验工作以及通常复杂的数据挖掘和处理工具。蛋白质功能,通常被称为其注释,被认为是通过蛋白质间相互作用网络的拓扑结构来体现的。特别是,越来越多的证据表明,执行相同功能的蛋白质比执行其他功能的蛋白质更有可能相互作用。然而,由于功能注释和蛋白质网络拓扑结构通常是分开研究的,它们之间的直接关系尚未得到全面证明。除了具有一般生物学意义外,这种证明还将进一步验证用于构建蛋白质注释和蛋白质 - 蛋白质相互作用数据集的数据提取和处理方法。

结果

我们基于自然语言处理(NLP)技术开发了一种从科学文本中自动提取蛋白质功能注释的方法。对于从整个PubMed中提取的蛋白质注释,我们评估了精确率和召回率,并将自动提取技术的性能与公共基因本体(GO)注释中使用的人工整理性能进行了比较。在我们展示的第二部分,我们报告了对基于文献的蛋白质网络中的群落与功能相关蛋白质的GO注释组之间对应关系的大规模调查。我们发现了全面的双向匹配:生物学注释组内的蛋白质形成的连接网络簇比随机预期的要密集得多,相反,紧密连接的网络群落与GO组表现出明显的非随机重叠。我们还使用我们的NLP技术提取的关系扩展了公开可用的GO生物学过程注释。GO组数量和大小的增加而组内连接密度没有任何明显下降表明,这种扩展显著拓宽了公共GO注释而没有稀释其质量。我们发现GO功能注释主要与物理相互作用蛋白质网络中的聚类相关,而其与间接调控网络群落的重叠要小三分之二。

结论

通过NLP技术提取的蛋白质功能注释扩展并丰富了现有的GO注释系统。GO功能模块性主要与物理相互作用网络中的聚类相关,表明这些相互作用维持的结构组织起着至关重要的作用。相反,蛋白质在物理相互作用网络中的聚类可以作为它们功能相似性的证据。

相似文献

1
Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks.基因本体注释的自动提取及其与蛋白质网络中聚类的相关性。
BMC Bioinformatics. 2007 Jul 10;8:243. doi: 10.1186/1471-2105-8-243.
2
An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.对生物创意(BioCreAtIvE)和基因本体注释(GOA)的基因本体(GO)注释检索的评估。
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24.
3
Evaluation of BioCreAtIvE assessment of task 2.生物创意任务2评估的评价
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24.
4
Biochemical networks: the evolution of gene annotation.生化网络:基因注释的进化。
Nat Chem Biol. 2010 Jan;6(1):4-5. doi: 10.1038/nchembio.288.
5
Text mining and protein annotations: the construction and use of protein description sentences.文本挖掘与蛋白质注释:蛋白质描述语句的构建与应用
Genome Inform. 2006;17(2):121-30.
6
Discovering gene annotations in biomedical text databases.在生物医学文本数据库中发现基因注释。
BMC Bioinformatics. 2008 Mar 6;9:143. doi: 10.1186/1471-2105-9-143.
7
Improving automatic GO annotation with semantic similarity.利用语义相似度提高 GO 自动注释的效果。
BMC Bioinformatics. 2022 Dec 12;23(Suppl 2):433. doi: 10.1186/s12859-022-04958-7.
8
Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets.整合证据、生物医学文献与统计相关性:基因集功能注释的新见解
BMC Bioinformatics. 2006 May 4;7:241. doi: 10.1186/1471-2105-7-241.
9
The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.基因本体注释(GOA)数据库:在UniProt中与基因本体共享知识。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D262-6. doi: 10.1093/nar/gkh021.
10
The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation.PIPA的开发:一种用于全基因组蛋白质功能注释的集成自动化流程
BMC Bioinformatics. 2008 Jan 25;9:52. doi: 10.1186/1471-2105-9-52.

引用本文的文献

1
Mantis: flexible and consensus-driven genome annotation.螳螂:灵活且基于共识的基因组注释。
Gigascience. 2021 Jun 2;10(6). doi: 10.1093/gigascience/giab042.
2
Profiling of indole metabolic pathway in thermo-sensitive Bainong male sterile line in wheat ( L.).小麦(L.)温敏型百农雄性不育系中吲哚代谢途径的分析
Physiol Mol Biol Plants. 2019 Jan;25(1):263-275. doi: 10.1007/s12298-018-0626-0. Epub 2018 Nov 28.
3
Cross-species multiple environmental stress responses: An integrated approach to identify candidate genes for multiple stress tolerance in sorghum (Sorghum bicolor (L.) Moench) and related model species.

本文引用的文献

1
Finding mesoscopic communities in sparse networks.在稀疏网络中寻找介观群落。
J Stat Mech. 2006 Sep 1;9:p09014. doi: 10.1088/1742-5468/2006/09/P09014.
2
Aggregative organization enhances the DNA end-joining process that is mediated by DNA-dependent protein kinase.聚合组织增强了由DNA依赖性蛋白激酶介导的DNA末端连接过程。
FEBS J. 2006 Jul;273(13):3063-75. doi: 10.1111/j.1742-4658.2006.05317.x. Epub 2006 Jun 6.
3
Automatic pathway building in biological association networks.生物关联网络中的自动通路构建
跨物种多环境胁迫响应:一种鉴定高粱(高粱 bicolor(L.)Moench)和相关模式物种多逆境耐受候选基因的综合方法。
PLoS One. 2018 Mar 28;13(3):e0192678. doi: 10.1371/journal.pone.0192678. eCollection 2018.
4
BC4GO: a full-text corpus for the BioCreative IV GO task.BC4GO:用于生物创意IV基因本体任务的全文语料库。
Database (Oxford). 2014 Jul 28;2014. doi: 10.1093/database/bau074. Print 2014.
5
Clustering gene expression regulators: new approach to disease subtyping.聚类基因表达调控因子:疾病亚分类的新方法。
PLoS One. 2014 Jan 9;9(1):e84955. doi: 10.1371/journal.pone.0084955. eCollection 2014.
6
Clustering based on multiple biological information: approach for predicting protein complexes.基于多种生物信息的聚类:预测蛋白质复合物的方法。
IET Syst Biol. 2013 Oct;7(5):223-30. doi: 10.1049/iet-syb.2012.0052.
7
Exploring molecular pathways of triple-negative breast cancer.探索三阴性乳腺癌的分子途径。
Genes Cancer. 2011 Sep;2(9):870-9. doi: 10.1177/1947601911432496.
8
Molecular signature and pathway analysis of human primary squamous and adenocarcinoma lung cancers.人类原发性鳞癌和腺癌肺癌的分子特征和通路分析。
Am J Cancer Res. 2012;2(1):93-103. Epub 2011 Nov 19.
9
Mining the Gene Wiki for functional genomic knowledge.从基因维基中挖掘功能基因组学知识。
BMC Genomics. 2011 Dec 13;12:603. doi: 10.1186/1471-2164-12-603.
10
A comparison of the functional modules identified from time course and static PPI network data.比较时程和静态 PPI 网络数据中鉴定出的功能模块。
BMC Bioinformatics. 2011 Aug 15;12:339. doi: 10.1186/1471-2105-12-339.
BMC Bioinformatics. 2006 Mar 24;7:171. doi: 10.1186/1471-2105-7-171.
4
Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.酿酒酵母中蛋白质复合物的全球格局。
Nature. 2006 Mar 30;440(7084):637-43. doi: 10.1038/nature04670. Epub 2006 Mar 22.
5
Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis.通过同源性生成的网络的聚类分析:参与癌症转移的重要蛋白质群落的自动识别。
BMC Bioinformatics. 2006 Jan 6;7:2. doi: 10.1186/1471-2105-7-2.
6
A protein interaction network of the malaria parasite Plasmodium falciparum.恶性疟原虫的蛋白质相互作用网络
Nature. 2005 Nov 3;438(7064):103-7. doi: 10.1038/nature04104.
7
Binding properties and evolution of homodimers in protein-protein interaction networks.蛋白质-蛋白质相互作用网络中同二聚体的结合特性与进化
Nucleic Acids Res. 2005 Jun 27;33(11):3629-35. doi: 10.1093/nar/gki678. Print 2005.
8
Data-poor categorization and passage retrieval for gene ontology annotation in Swiss-Prot.用于Swiss-Prot中基因本体注释的数据匮乏分类与段落检索
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S23. doi: 10.1186/1471-2105-6-S1-S23. Epub 2005 May 24.
9
Mining protein function from text using term-based support vector machines.使用基于术语的支持向量机从文本中挖掘蛋白质功能。
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S22. doi: 10.1186/1471-2105-6-S1-S22. Epub 2005 May 24.
10
Finding genomic ontology terms in text using evidence content.利用证据内容在文本中查找基因组本体术语。
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S21. doi: 10.1186/1471-2105-6-S1-S21. Epub 2005 May 24.