• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物苏打水用户体验:通过用户消歧实现知识图谱上的自然语言问答。

Bio-SODA UX: enabling natural language question answering over knowledge graphs with user disambiguation.

作者信息

Sima Ana Claudia, Mendes de Farias Tarcisio, Anisimova Maria, Dessimoz Christophe, Robinson-Rechavi Marc, Zbinden Erich, Stockinger Kurt

机构信息

SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.

Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.

出版信息

Distrib Parallel Databases. 2022;40(2-3):409-440. doi: 10.1007/s10619-022-07414-w. Epub 2022 Jul 16.

DOI:10.1007/s10619-022-07414-w
PMID:36097541
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9458692/
Abstract

The problem of natural language processing over structured data has become a growing research field, both within the relational database and the Semantic Web community, with significant efforts involved in question answering over knowledge graphs (KGQA). However, many of these approaches are either specifically targeted at question answering using DBpedia, or require to translate a natural language question to SPARQL in order to query the knowledge graph. Hence, these approaches often cannot be applied directly to complex where no prior training data is available. In this paper, we focus on the challenges of natural language processing over knowledge graphs of scientific datasets. In particular, we introduce Bio-SODA, a natural language processing engine that does not require training data in the form of question-answer pairs for generating SPARQL queries. Bio-SODA uses a generic graph-based approach for translating user questions to a ranked list of SPARQL candidate queries. Furthermore, Bio-SODA uses a novel ranking algorithm that includes node centrality as a measure of relevance for selecting the best SPARQL candidate query. Our experiments with real-world datasets across several scientific domains, including the official Question Answering over Linked Data (QALD) challenge, as well as the CORDIS dataset of European projects, show that Bio-SODA outperforms publicly available KGQA systems by an F1-score of least 20% and by an even higher factor on more complex bioinformatics datasets. Finally, we introduce Bio-SODA UX, a graphical user interface designed to assist users in the exploration of large knowledge graphs and in dynamically disambiguating natural language questions that target the data available in these graphs.

摘要

在关系数据库和语义网社区中,针对结构化数据的自然语言处理问题已成为一个不断发展的研究领域,人们在知识图谱问答(KGQA)方面投入了大量精力。然而,这些方法中的许多要么专门针对使用DBpedia进行问答,要么需要将自然语言问题翻译成SPARQL以便查询知识图谱。因此,这些方法通常不能直接应用于没有可用先验训练数据的复杂情况。在本文中,我们关注科学数据集知识图谱上自然语言处理的挑战。具体来说,我们引入了Bio-SODA,这是一种自然语言处理引擎,它在生成SPARQL查询时不需要问答对形式的训练数据。Bio-SODA使用一种基于通用图的方法将用户问题翻译成SPARQL候选查询的排序列表。此外,Bio-SODA使用一种新颖的排序算法,该算法将节点中心性作为相关性度量来选择最佳的SPARQL候选查询。我们对多个科学领域的真实世界数据集进行的实验,包括官方的链接数据问答(QALD)挑战以及欧洲项目的CORDIS数据集,表明Bio-SODA在F1分数上比公开可用的KGQA系统至少高出20%,在更复杂的生物信息学数据集上优势更大。最后,我们引入了Bio-SODA UX,这是一个图形用户界面,旨在帮助用户探索大型知识图谱,并动态消除针对这些图谱中可用数据的自然语言问题的歧义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/67dcf137c506/10619_2022_7414_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/c371f7027e3f/10619_2022_7414_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/93ffb27f0793/10619_2022_7414_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/e9ddc8795a36/10619_2022_7414_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/5acc532ef8f7/10619_2022_7414_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/68835ea09be9/10619_2022_7414_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/405161883179/10619_2022_7414_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/11e367e909c3/10619_2022_7414_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/67dcf137c506/10619_2022_7414_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/c371f7027e3f/10619_2022_7414_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/93ffb27f0793/10619_2022_7414_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/e9ddc8795a36/10619_2022_7414_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/5acc532ef8f7/10619_2022_7414_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/68835ea09be9/10619_2022_7414_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/405161883179/10619_2022_7414_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/11e367e909c3/10619_2022_7414_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11d3/9458692/67dcf137c506/10619_2022_7414_Fig8_HTML.jpg

相似文献

1
Bio-SODA UX: enabling natural language question answering over knowledge graphs with user disambiguation.生物苏打水用户体验:通过用户消歧实现知识图谱上的自然语言问答。
Distrib Parallel Databases. 2022;40(2-3):409-440. doi: 10.1007/s10619-022-07414-w. Epub 2022 Jul 16.
2
Querying knowledge graphs in natural language.用自然语言查询知识图谱。
J Big Data. 2021;8(1):3. doi: 10.1186/s40537-020-00383-w. Epub 2021 Jan 6.
3
The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge.SciQA 学术知识科学问答基准
Sci Rep. 2023 May 4;13(1):7240. doi: 10.1038/s41598-023-33607-z.
4
Graph NLU enabled question answering system.启用图自然语言理解的问答系统。
Heliyon. 2021 Sep 24;7(9):e08035. doi: 10.1016/j.heliyon.2021.e08035. eCollection 2021 Sep.
5
Visualization Environment for Federated Knowledge Graphs: Development of an Interactive Biomedical Query Language and Web Application Interface.联邦知识图谱可视化环境:交互式生物医学查询语言与Web应用程序界面的开发
JMIR Med Inform. 2020 Nov 23;8(11):e17964. doi: 10.2196/17964.
6
A distributed query execution engine of big attributed graphs.一种带属性大图的分布式查询执行引擎。
Springerplus. 2016 May 23;5(1):665. doi: 10.1186/s40064-016-2251-0. eCollection 2016.
7
Generating and Executing Complex Natural Language Queries across Linked Data.跨链接数据生成和执行复杂自然语言查询。
Stud Health Technol Inform. 2015;216:815-20.
8
A knowledge graph based question answering method for medical domain.一种基于知识图谱的医学领域问答方法。
PeerJ Comput Sci. 2021 Sep 1;7:e667. doi: 10.7717/peerj-cs.667. eCollection 2021.
9
ROBOKOP: an abstraction layer and user interface for knowledge graphs to support question answering.ROBOKOP:一种支持问答的知识图的抽象层和用户界面。
Bioinformatics. 2019 Dec 15;35(24):5382-5384. doi: 10.1093/bioinformatics/btz604.
10
Complexity and Expressive Power of Weakly Well-Designed SPARQL.弱设计良好的SPARQL的复杂性与表达能力
Theory Comput Syst. 2018;62(4):772-809. doi: 10.1007/s00224-017-9802-9. Epub 2017 Aug 14.

引用本文的文献

1
A Sample-Centric and Knowledge-Driven Computational Framework for Natural Products Drug Discovery.一种以样本为中心且知识驱动的天然产物药物发现计算框架。
ACS Cent Sci. 2024 Feb 20;10(3):494-510. doi: 10.1021/acscentsci.3c00800. eCollection 2024 Mar 27.
2
Year 2022 in Medical Natural Language Processing: Availability of Language Models as a Step in the Democratization of NLP in the Biomedical Area.2022 年医学自然语言处理:语言模型的可用性是生物医学领域 NLP 民主化的一步。
Yearb Med Inform. 2023 Aug;32(1):244-252. doi: 10.1055/s-0043-1768752. Epub 2023 Dec 26.

本文引用的文献

1
OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more.2021 年的 OMA 同源物:网站大改版,保守同工型,祖先进化基因顺序等等。
Nucleic Acids Res. 2021 Jan 8;49(D1):D373-D379. doi: 10.1093/nar/gkaa1007.
2
The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals.Bgee 套件:动物中综合的经过审核的表达图谱和比较转录组学。
Nucleic Acids Res. 2021 Jan 8;49(D1):D831-D847. doi: 10.1093/nar/gkaa793.
3
Enabling semantic queries across federated bioinformatics databases.实现跨联邦生物信息学数据库的语义查询。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz106.
4
BioFed: federated query processing over life sciences linked open data.BioFed:基于生命科学关联开放数据的联邦查询处理
J Biomed Semantics. 2017 Mar 15;8(1):13. doi: 10.1186/s13326-017-0118-0.