Suppr超能文献

使用扩展依赖图进行蛋白质-蛋白质相互作用的生物相容性全文段落检测。

BioC-compatible full-text passage detection for protein-protein interactions using extended dependency graph.

作者信息

Peng Yifan, Arighi Cecilia, Wu Cathy H, Vijay-Shanker K

机构信息

Computer & Information Sciences, University of Delaware and

Computer & Information Sciences, University of Delaware and Center for Bioinformatics & Computational Biology, University of Delaware, Newark, DE 19716, USA.

出版信息

Database (Oxford). 2016 May 11;2016. doi: 10.1093/database/baw072. Print 2016.

Abstract

There has been a large growth in the number of biomedical publications that report experimental results. Many of these results concern detection of protein-protein interactions (PPI). In BioCreative V, we participated in the BioC task and developed a PPI system to detect text passages with PPIs in the full-text articles. By adopting the BioC format, the output of the system can be seamlessly added to the biocuration pipeline with little effort required for the system integration. A distinctive feature of our PPI system is that it utilizes extended dependency graph, an intermediate level of representation that attempts to abstract away syntactic variations in text. As a result, we are able to use only a limited set of rules to extract PPI pairs in the sentences, and additional rules to detect additional passages for PPI pairs. For evaluation, we used the 95 articles that were provided for the BioC annotation task. We retrieved the unique PPIs from the BioGRID database for these articles and show that our system achieves a recall of 83.5%. In order to evaluate the detection of passages with PPIs, we further annotated Abstract and Results sections of 20 documents from the dataset and show that an f-value of 80.5% was obtained. To evaluate the generalizability of the system, we also conducted experiments on AIMed, a well-known PPI corpus. We achieved an f-value of 76.1% for sentence detection and an f-value of 64.7% for unique PPI detection.Database URL: http://proteininformationresource.org/iprolink/corpora.

摘要

报告实验结果的生物医学出版物数量有了大幅增长。其中许多结果涉及蛋白质 - 蛋白质相互作用(PPI)的检测。在生物创意V中,我们参与了生物C任务,并开发了一个PPI系统,用于在全文文章中检测含有PPI的文本段落。通过采用生物C格式,该系统的输出可以无缝添加到生物编目流程中,几乎无需进行系统集成工作。我们的PPI系统的一个显著特点是它利用了扩展依存图,这是一种中间表示层次,试图消除文本中的句法变化。因此,我们能够仅使用有限的一组规则来提取句子中的PPI对,并使用额外的规则来检测PPI对的其他段落。为了进行评估,我们使用了为生物C注释任务提供的95篇文章。我们从BioGRID数据库中检索了这些文章的唯一PPI,并表明我们的系统召回率达到了83.5%。为了评估含有PPI的段落的检测情况,我们进一步注释了数据集中20篇文档的摘要和结果部分,结果显示f值为80.5%。为了评估该系统的通用性,我们还在著名的PPI语料库AIMed上进行了实验。我们在句子检测方面的f值为76.1%,在唯一PPI检测方面的f值为64.7%。数据库网址:http://proteininformationresource.org/iprolink/corpora

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7be/4915133/9c817fa511dc/baw072f1p.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验