• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用词嵌入自动生成查询以检索描述实验方法的段落。

Automatic query generation using word embeddings for retrieving passages describing experimental methods.

作者信息

Aydın Ferhat, Hüsünbeyi Zehra Melce, Özgür Arzucan

机构信息

Department of Computer Engineering, Boğaziçi University, TR-34342 Bebek, Istanbul, Turkey.

Department of Computer Engineering, Boğaziçi University, TR-34342 Bebek, Istanbul, Turkey

出版信息

Database (Oxford). 2017 Jan 10;2017. doi: 10.1093/database/baw166. Print 2017.

DOI:10.1093/database/baw166
PMID:28077568
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5225401/
Abstract

Information regarding the physical interactions among proteins is crucial, since protein-protein interactions (PPIs) are central for many biological processes. The experimental techniques used to verify PPIs are vital for characterizing and assessing the reliability of the identified PPIs. A lot of information about PPIs and the experimental methods are only available in the text of the scientific publications that report them. In this study, we approach the problem of identifying passages with experimental methods for physical interactions between proteins as an information retrieval search task. The baseline system is based on query matching, where the queries are generated by utilizing the names (including synonyms) of the experimental methods in the Proteomics Standard Initiative-Molecular Interactions (PSI-MI) ontology. We propose two methods, where the baseline queries are expanded by including additional relevant terms. The first method is a supervised approach, where the most salient terms for each experimental method are obtained by using the term frequency-relevance frequency (tf.rf) metric over 13 articles from our manually annotated data set of 30 full text articles, which is made publicly available. On the other hand, the second method is an unsupervised approach, where the queries for each experimental method are expanded by using the word embeddings of the names of the experimental methods in the PSI-MI ontology. The word embeddings are obtained by utilizing a large unlabeled full text corpus. The proposed methods are evaluated on the test set consisting of 17 articles. Both methods obtain higher recall scores compared with the baseline, with a loss in precision. Besides higher recall, the word embeddings based approach achieves higher F-measure than the baseline and the tf.rf based methods. We also show that incorporating gene name and interaction keyword identification leads to improved precision and F-measure scores for all three evaluated methods. The tf.rf based approach was developed as part of our participation in the Collaborative Biocurator Assistant Task of the BioCreative V challenge assessment, whereas the word embeddings based approach is a novel contribution of this article.Database URL: https://github.com/ferhtaydn/biocemid/.

摘要

有关蛋白质之间物理相互作用的信息至关重要,因为蛋白质 - 蛋白质相互作用(PPI)是许多生物过程的核心。用于验证PPI的实验技术对于表征和评估所鉴定PPI的可靠性至关重要。许多关于PPI和实验方法的信息仅存在于报告它们的科学出版物文本中。在本研究中,我们将识别蛋白质之间物理相互作用实验方法段落的问题作为信息检索搜索任务来处理。基线系统基于查询匹配,其中查询是通过利用蛋白质组学标准倡议 - 分子相互作用(PSI - MI)本体中的实验方法名称(包括同义词)生成的。我们提出了两种方法,通过纳入额外的相关术语来扩展基线查询。第一种方法是一种监督方法,其中通过对来自我们公开提供的30篇全文文章的手动注释数据集中的13篇文章使用词频 - 相关频率(tf.rf)度量来获得每种实验方法的最显著术语。另一方面,第二种方法是一种无监督方法,其中通过使用PSI - MI本体中实验方法名称的词嵌入来扩展每种实验方法的查询。词嵌入是通过利用大型未标记全文语料库获得的。所提出的方法在由17篇文章组成的测试集上进行评估。与基线相比,两种方法都获得了更高的召回率分数,但精度有所损失。除了更高的召回率外,基于词嵌入的方法比基线和基于tf.rf的方法实现了更高的F值。我们还表明,纳入基因名称和相互作用关键词识别可提高所有三种评估方法的精度和F值分数。基于tf.rf的方法是作为我们参与BioCreative V挑战评估的协作生物编目助手任务的一部分而开发的,而基于词嵌入的方法是本文的一项新贡献。数据库网址:https://github.com/ferhtaydn/biocemid/

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/d054ab781e26/baw166f8p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/57e2a0d0b951/baw166f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/e6a7e4b84d96/baw166f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/81989dab1dc8/baw166f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/752a5c82e255/baw166f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/78343d02aae4/baw166f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/7092e3d7479d/baw166f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/dbfb67fba403/baw166f7p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/d054ab781e26/baw166f8p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/57e2a0d0b951/baw166f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/e6a7e4b84d96/baw166f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/81989dab1dc8/baw166f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/752a5c82e255/baw166f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/78343d02aae4/baw166f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/7092e3d7479d/baw166f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/dbfb67fba403/baw166f7p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11f/5225401/d054ab781e26/baw166f8p.jpg

相似文献

1
Automatic query generation using word embeddings for retrieving passages describing experimental methods.使用词嵌入自动生成查询以检索描述实验方法的段落。
Database (Oxford). 2017 Jan 10;2017. doi: 10.1093/database/baw166. Print 2017.
2
The BioC-BioGRID corpus: full text articles annotated for curation of protein-protein and genetic interactions.BioC-BioGRID语料库:为蛋白质-蛋白质和基因相互作用的编目而注释的全文文章。
Database (Oxford). 2017 Jan 10;2017. doi: 10.1093/database/baw147. Print 2017.
3
The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.BioCreative III 的蛋白质-蛋白质相互作用任务:文章的分类/排序和将生物本体论概念链接到全文。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S3. doi: 10.1186/1471-2105-12-S8-S3.
4
Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.基于文本挖掘的词表示在生物医学数据分析和机器学习任务中的蛋白质-蛋白质相互作用网络。
PLoS One. 2021 Oct 15;16(10):e0258623. doi: 10.1371/journal.pone.0258623. eCollection 2021.
5
Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.BioCreative VI 精准医学赛道概述:精准医学中的蛋白质相互作用和突变挖掘。
Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.
6
How to link ontologies and protein-protein interactions to literature: text-mining approaches and the BioCreative experience.如何将本体和蛋白质-蛋白质相互作用与文献联系起来:文本挖掘方法和 BioCreative 的经验。
Database (Oxford). 2012 Mar 21;2012:bas017. doi: 10.1093/database/bas017. Print 2012.
7
PIPE: a protein-protein interaction passage extraction module for BioCreative challenge.PIPE:用于生物创意挑战的蛋白质-蛋白质相互作用通路提取模块
Database (Oxford). 2016 Aug 14;2016. doi: 10.1093/database/baw101. Print 2016.
8
A method for named entity normalization in biomedical articles: application to diseases and plants.一种生物医学文章中命名实体规范化的方法:应用于疾病和植物
BMC Bioinformatics. 2017 Oct 13;18(1):451. doi: 10.1186/s12859-017-1857-8.
9
Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature.深度学习与本体论相遇:将心血管疾病本体论锚定在生物医学文献中的实验。
J Biomed Semantics. 2018 Apr 12;9(1):13. doi: 10.1186/s13326-018-0181-1.
10
BioC-compatible full-text passage detection for protein-protein interactions using extended dependency graph.使用扩展依赖图进行蛋白质-蛋白质相互作用的生物相容性全文段落检测。
Database (Oxford). 2016 May 11;2016. doi: 10.1093/database/baw072. Print 2016.

引用本文的文献

1
Linking entities through an ontology using word embeddings and syntactic re-ranking.通过使用词向量和句法重新排序将实体链接到本体中。
BMC Bioinformatics. 2019 Mar 27;20(1):156. doi: 10.1186/s12859-019-2678-8.
2
Improving average ranking precision in user searches for biomedical research datasets.提高用户在生物医学研究数据集搜索中的平均排名精度。
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax083.
3
BIOSSES: a semantic sentence similarity estimation system for the biomedical domain.BIOSSES:一种用于生物医学领域的语义句子相似度估计系统。

本文引用的文献

1
BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID.生物创意V生物C轨迹概述:生物网格的协作生物编目员助手任务。
Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw121. Print 2016.
2
Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions.用于疫苗相关基因-基因相互作用文献挖掘的相互作用网络本体的开发与应用。
J Biomed Semantics. 2015 Jan 6;6:2. doi: 10.1186/2041-1480-6-2. eCollection 2015.
3
The BioGRID interaction database: 2015 update.
Bioinformatics. 2017 Jul 15;33(14):i49-i58. doi: 10.1093/bioinformatics/btx238.
4
BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID.生物创意V生物C轨迹概述:生物网格的协作生物编目员助手任务。
Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw121. Print 2016.
生物通用互作数据库:2015年更新版
Nucleic Acids Res. 2015 Jan;43(Database issue):D470-8. doi: 10.1093/nar/gku1204. Epub 2014 Nov 26.
4
BioC: a minimalist approach to interoperability for biomedical text processing.BioC:一种用于生物医学文本处理的最小互操作方法。
Database (Oxford). 2013 Sep 18;2013:bat064. doi: 10.1093/database/bat064. Print 2013.
5
The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.BioCreative III 的蛋白质-蛋白质相互作用任务:文章的分类/排序和将生物本体论概念链接到全文。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S3. doi: 10.1186/1471-2105-12-S8-S3.
6
Detection of interaction articles and experimental methods in biomedical literature.生物医学文献中交互文章和实验方法的检测。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S13. doi: 10.1186/1471-2105-12-S8-S13.
7
A linear classifier based on entity recognition tools and a statistical approach to method extraction in the protein-protein interaction literature.基于实体识别工具和统计方法的线性分类器,用于提取蛋白质相互作用文献中的方法。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S12. doi: 10.1186/1471-2105-12-S8-S12.
8
Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature.从生物医学文献中检测蛋白质-蛋白质相互作用的实验技术并选择相关文献。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S11. doi: 10.1186/1471-2105-12-S8-S11.
9
Simple and efficient machine learning frameworks for identifying protein-protein interaction relevant articles and experimental methods used to study the interactions.用于识别与蛋白质-蛋白质相互作用相关的文章和用于研究相互作用的实验方法的简单有效的机器学习框架。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S10. doi: 10.1186/1471-2105-12-S8-S10.
10
Overview of the BioCreative III Workshop.第三届生物创意研讨会概述。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.