• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过将谓词逻辑应用于生物医学文献来预测蛋白质功能。

Predicting protein functions by applying predicate logic to biomedical literature.

机构信息

Department of Electrical and Computer Engineering, Khalifa University, Abu Dhabi, United Arab Emirates.

出版信息

BMC Bioinformatics. 2019 Feb 8;20(1):71. doi: 10.1186/s12859-019-2594-y.

DOI:10.1186/s12859-019-2594-y
PMID:30736739
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6368809/
Abstract

BACKGROUND

A large number of computational methods have been proposed for predicting protein functions. The underlying techniques adopted by most of these methods revolve around predicting the functions of an unannotated protein p from already annotated proteins that have similar characteristics as p. Recent Information Extraction methods take advantage of the huge growth of biomedical literature to predict protein functions. They extract biological molecule terms that directly describe protein functions from biomedical texts. However, they consider only explicitly mentioned terms that co-occur with proteins in texts. We observe that some important biological molecule terms pertaining functional categories may implicitly co-occur with proteins in texts. Therefore, the methods that rely solely on explicitly mentioned terms in texts may miss vital functional information implicitly mentioned in the texts.

RESULTS

To overcome the limitations of methods that rely solely on explicitly mentioned terms in texts to predict protein functions, we propose in this paper an Information Extraction system called PL-PPF. The proposed system employs techniques for predicting the functions of proteins based on their co-occurrences with explicitly and implicitly mentioned biological molecule terms that pertain functional categories in biomedical literature. That is, PL-PPF employs a combination of statistical-based explicit term extraction techniques and logic-based implicit term extraction techniques. The statistical component of PL-PPF predicts some of the functions of a protein by extracting the explicitly mentioned functional terms that directly describe the functions of the protein from the biomedical texts associated with the protein. The logic-based component of PL-PPF predicts additional functions of the protein by inferring the functional terms that co-occur implicitly with the protein in the biomedical texts associated with it. First, the system employs its statistical-based component to extract the explicitly mentioned functional terms. Then, it employs its logic-based component to infer additional functions of the protein. Our hypothesis is that important biological molecule terms pertaining functional categories of proteins are likely to co-occur implicitly with the proteins in biomedical texts. We evaluated PL-PPF experimentally and compared it with five systems. Results revealed better prediction performance.

CONCLUSIONS

The experimental results showed that PL-PPF outperformed the other five systems. This is an indication of the effectiveness and practical viability of PL-PPF's combination of explicit and implicit techniques. We also evaluated two versions of PL-PPF: one adopting the complete techniques (i.e., adopting both the implicit and explicit techniques) and the other adopting only the explicit terms co-occurrence extraction techniques (i.e., without the inference rules for predicate logic). The experimental results showed that the complete version outperformed significantly the other version. This is attributed to the effectiveness of the rules of predicate logic to infer functional terms that co-occur implicitly with proteins in biomedical texts. A demo application of PL-PPF can be accessed through the following link: http://ecesrvr.kustar.ac.ae:8080/plppf/.

摘要

背景

已经提出了大量用于预测蛋白质功能的计算方法。这些方法中的大多数所采用的基础技术都围绕着从具有与 p 相似特征的已注释蛋白质中预测未注释蛋白质 p 的功能。最近的信息提取方法利用生物医学文献的巨大增长来预测蛋白质功能。它们从生物医学文本中提取直接描述蛋白质功能的生物分子术语。然而,它们只考虑与文本中蛋白质共同出现的显式提及的术语。我们观察到,某些与功能类别相关的重要生物分子术语可能会在文本中隐含地与蛋白质共同出现。因此,仅依赖于文本中显式提及的术语的方法可能会错过文本中隐含提及的重要功能信息。

结果

为了克服仅依赖于文本中显式提及的术语来预测蛋白质功能的方法的局限性,我们在本文中提出了一种称为 PL-PPF 的信息提取系统。所提出的系统基于在生物医学文献中与功能类别相关的蛋白质的共同出现,采用预测蛋白质功能的技术,这些技术涉及明确提及的和隐含提及的与功能类别相关的生物分子术语。也就是说,PL-PPF 采用了基于统计的显式术语提取技术和基于逻辑的隐式术语提取技术的组合。PL-PPF 的统计部分通过从与蛋白质相关联的生物医学文本中提取直接描述蛋白质功能的显式功能术语来预测蛋白质的某些功能。PL-PPF 的基于逻辑的部分通过推断与相关联的生物医学文本中隐含共同出现的功能术语来预测蛋白质的其他功能。首先,系统使用其基于统计的组件提取显式提及的功能术语。然后,它使用其基于逻辑的组件推断蛋白质的其他功能。我们的假设是,与蛋白质的功能类别相关的重要生物分子术语可能会在生物医学文本中隐含地共同出现。我们通过实验评估了 PL-PPF,并将其与五个系统进行了比较。结果显示出更好的预测性能。

结论

实验结果表明,PL-PPF 优于其他五个系统。这表明了 PL-PPF 显式和隐式技术组合的有效性和实际可行性。我们还评估了 PL-PPF 的两个版本:一个采用完整技术(即采用隐式和显式技术),另一个仅采用显式术语共现提取技术(即没有谓词逻辑的推理规则)。实验结果表明,完整版本明显优于其他版本。这归因于谓词逻辑规则推断生物医学文本中隐含共同出现的功能术语的有效性。PL-PPF 的演示应用程序可以通过以下链接访问:http://ecesrvr.kustar.ac.ae:8080/plppf/。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/ec8799cac103/12859_2019_2594_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/358bf357e2fc/12859_2019_2594_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/800cfad7df72/12859_2019_2594_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/6b1442b1a556/12859_2019_2594_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/20753e509fc6/12859_2019_2594_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/e08c0d5d00e2/12859_2019_2594_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/ec86637f7e59/12859_2019_2594_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/ec8799cac103/12859_2019_2594_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/358bf357e2fc/12859_2019_2594_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/800cfad7df72/12859_2019_2594_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/6b1442b1a556/12859_2019_2594_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/20753e509fc6/12859_2019_2594_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/e08c0d5d00e2/12859_2019_2594_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/ec86637f7e59/12859_2019_2594_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d09/6368809/ec8799cac103/12859_2019_2594_Fig7_HTML.jpg

相似文献

1
Predicting protein functions by applying predicate logic to biomedical literature.通过将谓词逻辑应用于生物医学文献来预测蛋白质功能。
BMC Bioinformatics. 2019 Feb 8;20(1):71. doi: 10.1186/s12859-019-2594-y.
2
Predicting the functions of a protein from its ability to associate with other molecules.根据蛋白质与其他分子结合的能力预测其功能。
BMC Bioinformatics. 2016 Jan 15;17:34. doi: 10.1186/s12859-016-0882-3.
3
Inferring the Functions of Proteins from the Interrelationships between Functional Categories.从功能类别之间的相互关系推断蛋白质的功能。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Jan-Feb;15(1):157-167. doi: 10.1109/TCBB.2016.2615608. Epub 2016 Oct 6.
4
Predicting protein function from biomedical text.从生物医学文本中预测蛋白质功能。
Annu Int Conf IEEE Eng Med Biol Soc. 2015;2015:3275-8. doi: 10.1109/EMBC.2015.7319091.
5
An Effective Disease Risk Indicator Tool.一种有效的疾病风险指标工具。
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:5284-5287. doi: 10.1109/EMBC44109.2020.9175994.
6
Personizing the prediction of future susceptibility to a specific disease.个性化预测特定疾病的未来易感性。
PLoS One. 2021 Jan 6;16(1):e0243127. doi: 10.1371/journal.pone.0243127. eCollection 2021.
7
Combining learning and constraints for genome-wide protein annotation.联合学习与约束进行全基因组蛋白注释。
BMC Bioinformatics. 2019 Jun 17;20(1):338. doi: 10.1186/s12859-019-2875-5.
8
Logic minimization and rule extraction for identification of functional sites in molecular sequences.逻辑最小化和规则提取在分子序列中功能位点的识别。
BioData Min. 2012 Aug 16;5(1):10. doi: 10.1186/1756-0381-5-10.
9
Protein function prediction using text-based features extracted from the biomedical literature: the CAFA challenge.基于生物医学文献中提取的文本特征进行蛋白质功能预测:CAFA 挑战赛。
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S14. doi: 10.1186/1471-2105-14-S3-S14. Epub 2013 Feb 28.
10
AVID: an integrative framework for discovering functional relationships among proteins.AVID:一个用于发现蛋白质间功能关系的综合框架。
BMC Bioinformatics. 2005 Jun 1;6:136. doi: 10.1186/1471-2105-6-136.

引用本文的文献

1
PlantConnectome: A knowledge graph database encompassing >71,000 plant articles.植物连接组:一个包含超过71000篇植物相关文章的知识图谱数据库。
Plant Cell. 2025 Jul 1;37(7). doi: 10.1093/plcell/koaf169.

本文引用的文献

1
A logic-based dynamic modeling approach to explicate the evolution of the central dogma of molecular biology.一种基于逻辑的动态建模方法,用于阐释分子生物学中心法则的演变。
PLoS One. 2017 Dec 21;12(12):e0189922. doi: 10.1371/journal.pone.0189922. eCollection 2017.
2
DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.DeepGO:使用深度本体感知分类器从序列和相互作用预测蛋白质功能。
Bioinformatics. 2018 Feb 15;34(4):660-668. doi: 10.1093/bioinformatics/btx624.
3
Inferring the Functions of Proteins from the Interrelationships between Functional Categories.
从功能类别之间的相互关系推断蛋白质的功能。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Jan-Feb;15(1):157-167. doi: 10.1109/TCBB.2016.2615608. Epub 2016 Oct 6.
4
Applying Monte Carlo Simulation to Biomedical Literature to Approximate Genetic Network.将蒙特卡洛模拟应用于生物医学文献以近似遗传网络。
IEEE/ACM Trans Comput Biol Bioinform. 2016 May-Jun;13(3):494-504. doi: 10.1109/TCBB.2015.2481399. Epub 2015 Sep 23.
5
iPFPi: A System for Improving Protein Function Prediction through Cumulative Iterations.iPFPi:一种通过累积迭代改进蛋白质功能预测的系统。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):825-36. doi: 10.1109/TCBB.2014.2344681.
6
Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.使用GOstruct评估多种文本挖掘特征以进行自动蛋白质功能预测。
J Biomed Semantics. 2015 Mar 18;6:9. doi: 10.1186/s13326-015-0006-4. eCollection 2015.
7
Text as data: using text-based features for proteins representation and for computational prediction of their characteristics.文本即数据:利用基于文本的特征进行蛋白质表征及其特性的计算预测。
Methods. 2015 Mar;74:54-64. doi: 10.1016/j.ymeth.2014.10.027. Epub 2014 Nov 15.
8
Protein function prediction using text-based features extracted from the biomedical literature: the CAFA challenge.基于生物医学文献中提取的文本特征进行蛋白质功能预测:CAFA 挑战赛。
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S14. doi: 10.1186/1471-2105-14-S3-S14. Epub 2013 Feb 28.
9
Combining heterogeneous data sources for accurate functional annotation of proteins.整合异构数据源以实现蛋白质功能注释的准确性。
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2105-14-S3-S10. Epub 2013 Feb 28.
10
Logic-based models in systems biology: a predictive and parameter-free network analysis method.基于逻辑的系统生物学模型:一种预测和无参数的网络分析方法。
Integr Biol (Camb). 2012 Nov;4(11):1323-37. doi: 10.1039/c2ib20193c.