• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BIOSSES:一种用于生物医学领域的语义句子相似度估计系统。

BIOSSES: a semantic sentence similarity estimation system for the biomedical domain.

作者信息

Sogancioglu Gizem, Öztürk Hakime, Özgür Arzucan

机构信息

Department of Computer Engineering, Bogazici University, Istanbul, Turkey.

R&D and Special Projects Department, Yapı Kredi Technology, Istanbul, Turkey.

出版信息

Bioinformatics. 2017 Jul 15;33(14):i49-i58. doi: 10.1093/bioinformatics/btx238.

DOI:10.1093/bioinformatics/btx238
PMID:28881973
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5870675/
Abstract

MOTIVATION

The amount of information available in textual format is rapidly increasing in the biomedical domain. Therefore, natural language processing (NLP) applications are becoming increasingly important to facilitate the retrieval and analysis of these data. Computing the semantic similarity between sentences is an important component in many NLP tasks including text retrieval and summarization. A number of approaches have been proposed for semantic sentence similarity estimation for generic English. However, our experiments showed that such approaches do not effectively cover biomedical knowledge and produce poor results for biomedical text.

METHODS

We propose several approaches for sentence-level semantic similarity computation in the biomedical domain, including string similarity measures and measures based on the distributed vector representations of sentences learned in an unsupervised manner from a large biomedical corpus. In addition, ontology-based approaches are presented that utilize general and domain-specific ontologies. Finally, a supervised regression based model is developed that effectively combines the different similarity computation metrics. A benchmark data set consisting of 100 sentence pairs from the biomedical literature is manually annotated by five human experts and used for evaluating the proposed methods.

RESULTS

The experiments showed that the supervised semantic sentence similarity computation approach obtained the best performance (0.836 correlation with gold standard human annotations) and improved over the state-of-the-art domain-independent systems up to 42.6% in terms of the Pearson correlation metric.

AVAILABILITY AND IMPLEMENTATION

A web-based system for biomedical semantic sentence similarity computation, the source code, and the annotated benchmark data set are available at: http://tabilab.cmpe.boun.edu.tr/BIOSSES/ .

CONTACT

gizemsogancioglu@gmail.com or arzucan.ozgur@boun.edu.tr.

摘要

动机

生物医学领域中以文本格式存在的信息量正在迅速增长。因此,自然语言处理(NLP)应用对于促进这些数据的检索和分析变得越来越重要。计算句子之间的语义相似度是许多NLP任务(包括文本检索和摘要)中的一个重要组成部分。已经提出了许多方法来估计通用英语的语义句子相似度。然而,我们的实验表明,这些方法不能有效地涵盖生物医学知识,并且对于生物医学文本会产生较差的结果。

方法

我们提出了几种用于生物医学领域句子级语义相似度计算的方法,包括字符串相似度度量和基于从大型生物医学语料库中无监督学习得到的句子分布式向量表示的度量。此外,还提出了基于本体的方法,这些方法利用了通用和特定领域的本体。最后,开发了一种基于监督回归的模型,该模型有效地结合了不同的相似度计算指标。一个由来自生物医学文献的100个句子对组成的基准数据集由五名人类专家进行人工标注,并用于评估所提出的方法。

结果

实验表明,监督语义句子相似度计算方法获得了最佳性能(与黄金标准人工标注的相关性为0.836),并且在皮尔逊相关度量方面比最先进的独立于领域的系统提高了42.6%。

可用性和实现

用于生物医学语义句子相似度计算的基于网络的系统、源代码和带注释的基准数据集可在以下网址获得:http://tabilab.cmpe.boun.edu.tr/BIOSSES/ 。

联系方式

gizemsogancioglu@gmail.com或arzucan.ozgur@boun.edu.tr 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/5870675/56415dce9520/btx238f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/5870675/5eeefd4c5f50/btx238f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/5870675/967fe99c9a30/btx238f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/5870675/746314bb07a3/btx238f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/5870675/0e50eb204f77/btx238f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/5870675/16397a41de73/btx238f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/5870675/56415dce9520/btx238f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/5870675/5eeefd4c5f50/btx238f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/5870675/967fe99c9a30/btx238f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/5870675/746314bb07a3/btx238f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/5870675/0e50eb204f77/btx238f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/5870675/16397a41de73/btx238f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/5870675/56415dce9520/btx238f6.jpg

相似文献

1
BIOSSES: a semantic sentence similarity estimation system for the biomedical domain.BIOSSES:一种用于生物医学领域的语义句子相似度估计系统。
Bioinformatics. 2017 Jul 15;33(14):i49-i58. doi: 10.1093/bioinformatics/btx238.
2
Neural sentence embedding models for semantic similarity estimation in the biomedical domain.生物医学领域中语义相似度估计的神经句子嵌入模型。
BMC Bioinformatics. 2019 Apr 11;20(1):178. doi: 10.1186/s12859-019-2789-2.
3
The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.2019年n2c2/OHNLP临床语义文本相似性赛道:概述
JMIR Med Inform. 2020 Nov 27;8(11):e23375. doi: 10.2196/23375.
4
Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis.使用Transformer模型预测临床句子对之间的语义相似性:评估与表征分析
JMIR Med Inform. 2021 May 26;9(5):e23099. doi: 10.2196/23099.
5
Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.多本体精炼嵌入模型(MORE):一种基于混合多本体和语料库的生物医学概念语义表示模型。
J Biomed Inform. 2020 Nov;111:103581. doi: 10.1016/j.jbi.2020.103581. Epub 2020 Oct 1.
6
Semantic similarity in the biomedical domain: an evaluation across knowledge sources.生物医学领域的语义相似度:跨知识源的评估。
BMC Bioinformatics. 2012 Oct 10;13:261. doi: 10.1186/1471-2105-13-261.
7
Protocol for a reproducible experimental survey on biomedical sentence similarity.生物医学句子相似度可重复实验调查方案
PLoS One. 2021 Mar 24;16(3):e0248663. doi: 10.1371/journal.pone.0248663. eCollection 2021.
8
Quantifying semantic similarity of clinical evidence in the biomedical literature to facilitate related evidence synthesis.量化生物医学文献中临床证据的语义相似度,以促进相关证据的综合。
J Biomed Inform. 2019 Dec;100:103321. doi: 10.1016/j.jbi.2019.103321. Epub 2019 Oct 30.
9
Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation.基于深度神经网络的临床相关生物医学文本摘要:模型开发与验证。
J Med Internet Res. 2020 Oct 23;22(10):e19810. doi: 10.2196/19810.
10
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

引用本文的文献

1
CSpace: a concept embedding space for biomedical applications.CSpace:一种用于生物医学应用的概念嵌入空间。
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf376.
2
LitSense 2.0: AI-powered biomedical information retrieval with sentence and passage level knowledge discovery.LitSense 2.0:具有句子和段落级知识发现功能的人工智能驱动的生物医学信息检索。
Nucleic Acids Res. 2025 Jul 7;53(W1):W361-W368. doi: 10.1093/nar/gkaf417.
3
Clinical insights: A comprehensive review of language models in medicine.临床见解:医学领域语言模型的全面综述

本文引用的文献

1
Automatic query generation using word embeddings for retrieving passages describing experimental methods.使用词嵌入自动生成查询以检索描述实验方法的段落。
Database (Oxford). 2017 Jan 10;2017. doi: 10.1093/database/baw166. Print 2017.
2
The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature.基于交互网络本体的生物医学文献中多关键词表示的复杂交互建模与挖掘
BioData Min. 2016 Dec 19;9:41. doi: 10.1186/s13040-016-0118-0. eCollection 2016.
3
The Human Phenotype Ontology in 2017.
PLOS Digit Health. 2025 May 8;4(5):e0000800. doi: 10.1371/journal.pdig.0000800. eCollection 2025 May.
4
Benchmarking large language models for biomedical natural language processing applications and recommendations.用于生物医学自然语言处理应用的大型语言模型基准测试及建议。
Nat Commun. 2025 Apr 6;16(1):3280. doi: 10.1038/s41467-025-56989-2.
5
Transformers and large language models in healthcare: A review.医疗保健中的变压器和大型语言模型:综述。
Artif Intell Med. 2024 Aug;154:102900. doi: 10.1016/j.artmed.2024.102900. Epub 2024 Jun 5.
6
nach0: multimodal natural and chemical languages foundation model.Nach0:多模态自然与化学语言基础模型。
Chem Sci. 2024 May 8;15(22):8380-8389. doi: 10.1039/d4sc00966e. eCollection 2024 Jun 5.
7
BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights.BioLORD-2023:融合大型语言模型和临床知识图谱洞察的语义文本表示。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1844-1855. doi: 10.1093/jamia/ocae029.
8
Standardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach.将多站点临床记录标题标准化为 LOINC 文档本体:基于转换器的方法。
AMIA Annu Symp Proc. 2024 Jan 11;2023:834-843. eCollection 2023.
9
MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval.MedCPT:利用大规模 PubMed 检索日志进行零样本生物医学信息检索的对比预训练 Transformer。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad651.
10
An extensive benchmark study on biomedical text generation and mining with ChatGPT.一项关于使用ChatGPT进行生物医学文本生成和挖掘的广泛基准研究。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad557.
2017年的人类表型本体论。
Nucleic Acids Res. 2017 Jan 4;45(D1):D865-D876. doi: 10.1093/nar/gkw1039. Epub 2016 Nov 28.
4
Computing semantic similarity between biomedical concepts using new information content approach.使用新的信息内容方法计算生物医学概念之间的语义相似性。
J Biomed Inform. 2016 Feb;59:258-75. doi: 10.1016/j.jbi.2015.12.007. Epub 2015 Dec 17.
5
A supervised approach to quantifying sentence similarity: with application to evidence based medicine.一种用于量化句子相似度的监督方法:应用于循证医学。
PLoS One. 2015 Jun 3;10(6):e0129392. doi: 10.1371/journal.pone.0129392. eCollection 2015.
6
Mixed lineage kinase domain-like protein MLKL causes necrotic membrane disruption upon phosphorylation by RIP3.混合谱系激酶结构域样蛋白 MLKL 在 RIP3 磷酸化后引发坏死性膜破坏。
Mol Cell. 2014 Apr 10;54(1):133-146. doi: 10.1016/j.molcel.2014.03.003. Epub 2014 Apr 3.
7
The anti-tumor effect of shikonin on osteosarcoma by inducing RIP1 and RIP3 dependent necroptosis.紫草素通过诱导 RIP1 和 RIP3 依赖性坏死性细胞凋亡对骨肉瘤的抗肿瘤作用。
BMC Cancer. 2013 Dec 6;13:580. doi: 10.1186/1471-2407-13-580.
8
A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain.基于本体的语义相似性度量的统一框架:在生物医学领域的研究。
J Biomed Inform. 2014 Apr;48:38-53. doi: 10.1016/j.jbi.2013.11.006. Epub 2013 Nov 21.
9
Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text.评估语义相似性和关联性的度量标准,以消除生物医学文本中的术语歧义。
J Biomed Inform. 2013 Dec;46(6):1116-24. doi: 10.1016/j.jbi.2013.08.008. Epub 2013 Sep 4.
10
An ontology-based similarity measure for biomedical data-application to radiology reports.基于本体的生物医学数据相似度测量-在放射学报告中的应用。
J Biomed Inform. 2013 Oct;46(5):857-68. doi: 10.1016/j.jbi.2013.06.013. Epub 2013 Jul 11.