• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

卡斯伯特:基于BERT的复合注释生物模拟模型实体检索

CASBERT: BERT-based retrieval for compositely annotated biosimulation model entities.

作者信息

Munarko Yuda, Rampadarath Anand, Nickerson David P

机构信息

Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand.

The New Zealand Institute for Plant & Food Research Ltd., Auckland, New Zealand.

出版信息

Front Bioinform. 2023 Feb 14;3:1107467. doi: 10.3389/fbinf.2023.1107467. eCollection 2023.

DOI:10.3389/fbinf.2023.1107467
PMID:36865672
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9971925/
Abstract

Maximising FAIRness of biosimulation models requires a comprehensive description of model entities such as reactions, variables, and components. The COmputational Modeling in BIology NEtwork (COMBINE) community encourages the use of Resource Description Framework with composite annotations that semantically involve ontologies to ensure completeness and accuracy. These annotations facilitate scientists to find models or detailed information to inform further reuse, such as model composition, reproduction, and curation. SPARQL has been recommended as a key standard to access semantic annotation with RDF, which helps get entities precisely. However, SPARQL is unsuitable for most repository users who explore biosimulation models freely without adequate knowledge of ontologies, RDF structure, and SPARQL syntax. We propose here a text-based information retrieval approach, CASBERT, that is easy to use and can present candidates of relevant entities from models across a repository's contents. CASBERT adapts Bidirectional Encoder Representations from Transformers (BERT), where each composite annotation about an entity is converted into an entity embedding for subsequent storage in a list of entity embeddings. For entity lookup, a query is transformed to a query embedding and compared to the entity embeddings, and then the entities are displayed in order based on their similarity. The list structure makes it possible to implement CASBERT as an efficient search engine product, with inexpensive addition, modification, and insertion of entity embedding. To demonstrate and test CASBERT, we created a dataset for testing from the Physiome Model Repository and a static export of the BioModels database consisting of query-entities pairs. Measured using Mean Average Precision and Mean Reciprocal Rank, we found that our approach can perform better than the traditional bag-of-words method.

摘要

最大化生物模拟模型的公平性需要对模型实体进行全面描述,如反应、变量和组件。生物网络计算建模(COMBINE)社区鼓励使用带有复合注释的资源描述框架,这些注释在语义上涉及本体,以确保完整性和准确性。这些注释有助于科学家找到模型或详细信息,以便进一步重用,如模型组合、再现和管理。SPARQL已被推荐为访问带有RDF的语义注释的关键标准,这有助于精确获取实体。然而,SPARQL不适用于大多数在没有足够本体、RDF结构和SPARQL语法知识的情况下自由探索生物模拟模型的存储库用户。我们在此提出一种基于文本的信息检索方法CASBERT,它易于使用,并且可以从存储库内容中的模型中呈现相关实体的候选对象。CASBERT采用了来自Transformer的双向编码器表示(BERT),其中关于一个实体的每个复合注释都被转换为一个实体嵌入,以便随后存储在实体嵌入列表中。对于实体查找,一个查询被转换为一个查询嵌入,并与实体嵌入进行比较,然后根据实体的相似度按顺序显示实体。列表结构使得将CASBERT实现为一个高效的搜索引擎产品成为可能,实体嵌入的添加、修改和插入成本低廉。为了演示和测试CASBERT,我们从生理组模型存储库创建了一个测试数据集,并从BioModels数据库进行了由查询-实体对组成的静态导出。使用平均精度均值和平均倒数排名进行测量,我们发现我们的方法比传统的词袋法表现更好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1238/9971925/1430ff2dce26/fbinf-03-1107467-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1238/9971925/f4d42cb888cb/fbinf-03-1107467-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1238/9971925/480f106103bd/fbinf-03-1107467-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1238/9971925/b27723650b03/fbinf-03-1107467-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1238/9971925/0acf070e489d/fbinf-03-1107467-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1238/9971925/8dd4561b1fa8/fbinf-03-1107467-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1238/9971925/1430ff2dce26/fbinf-03-1107467-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1238/9971925/f4d42cb888cb/fbinf-03-1107467-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1238/9971925/480f106103bd/fbinf-03-1107467-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1238/9971925/b27723650b03/fbinf-03-1107467-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1238/9971925/0acf070e489d/fbinf-03-1107467-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1238/9971925/8dd4561b1fa8/fbinf-03-1107467-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1238/9971925/1430ff2dce26/fbinf-03-1107467-g006.jpg

相似文献

1
CASBERT: BERT-based retrieval for compositely annotated biosimulation model entities.卡斯伯特:基于BERT的复合注释生物模拟模型实体检索
Front Bioinform. 2023 Feb 14;3:1107467. doi: 10.3389/fbinf.2023.1107467. eCollection 2023.
2
Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE).使用基于 Transformer 的方法构建组合注释实体的搜索工具:Biosimulation Model Search Engine (BMSE) 的案例研究。
F1000Res. 2023 Feb 10;12:162. doi: 10.12688/f1000research.128982.1. eCollection 2023.
3
NLIMED: Natural Language Interface for Model Entity Discovery in Biosimulation Model Repositories.NLIMED:生物模拟模型存储库中模型实体发现的自然语言接口。
Front Physiol. 2022 Feb 24;13:820683. doi: 10.3389/fphys.2022.820683. eCollection 2022.
4
Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN(带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合)模型的医患对话多标签分类:命名实体研究
JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.
5
Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study.使用字符级和实体级表示来增强基于Transformer的临床语义文本相似性模型的双向编码器表示:临床STS建模研究
JMIR Med Inform. 2020 Dec 29;8(12):e23357. doi: 10.2196/23357.
6
Korean clinical entity recognition from diagnosis text using BERT.基于 BERT 的韩语文本临床实体识别。
BMC Med Inform Decis Mak. 2020 Sep 30;20(Suppl 7):242. doi: 10.1186/s12911-020-01241-8.
7
Knowledge Graph Completion for the Chinese Text of Cultural Relics Based on Bidirectional Encoder Representations from Transformers with Entity-Type Information.基于带有实体类型信息的变换器双向编码器表征的文物中文文本知识图谱补全
Entropy (Basel). 2020 Oct 16;22(10):1168. doi: 10.3390/e22101168.
8
Adapting Bidirectional Encoder Representations from Transformers (BERT) to Assess Clinical Semantic Textual Similarity: Algorithm Development and Validation Study.改编来自Transformer的双向编码器表征(BERT)以评估临床语义文本相似性:算法开发与验证研究。
JMIR Med Inform. 2021 Feb 3;9(2):e22795. doi: 10.2196/22795.
9
Processing SPARQL queries with regular expressions in RDF databases.在 RDF 数据库中使用正则表达式处理 SPARQL 查询。
BMC Bioinformatics. 2011 Mar 29;12 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-12-S2-S6.
10
IDSM ChemWebRDF: SPARQLing small-molecule datasets.IDSM化学网络资源描述框架:对小分子数据集进行SPARQL查询
J Cheminform. 2021 May 12;13(1):38. doi: 10.1186/s13321-021-00515-1.

引用本文的文献

1
Knowledge Representation and Management in the Age of Long Covid and Large Language Models: a 2022-2023 Survey.长新冠与大语言模型时代的知识表示与管理:2022 - 2023年调查
Yearb Med Inform. 2024 Aug;33(1):216-222. doi: 10.1055/s-0044-1800747. Epub 2025 Apr 8.
2
Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE).使用基于 Transformer 的方法构建组合注释实体的搜索工具:Biosimulation Model Search Engine (BMSE) 的案例研究。
F1000Res. 2023 Feb 10;12:162. doi: 10.12688/f1000research.128982.1. eCollection 2023.

本文引用的文献

1
NLIMED: Natural Language Interface for Model Entity Discovery in Biosimulation Model Repositories.NLIMED:生物模拟模型存储库中模型实体发现的自然语言接口。
Front Physiol. 2022 Feb 24;13:820683. doi: 10.3389/fphys.2022.820683. eCollection 2022.
2
OMEX metadata specification (version 1.2).OMEX 元数据规范(版本 1.2)。
J Integr Bioinform. 2021 Oct 20;18(3):20210020. doi: 10.1515/jib-2021-0020.
3
libOmexMeta: enabling semantic annotation of models to support FAIR principles.libOmexMeta:支持模型语义注释以支持 FAIR 原则。
Bioinformatics. 2021 Dec 11;37(24):4898-4900. doi: 10.1093/bioinformatics/btab445.
4
Model annotation and discovery with the Physiome Model Repository.基于 Physiome 模型知识库的模型标注和发现。
BMC Bioinformatics. 2019 Sep 6;20(1):457. doi: 10.1186/s12859-019-2987-y.
5
Harmonizing semantic annotations for computational models in biology.生物学计算模型的语义标注协调。
Brief Bioinform. 2019 Mar 22;20(2):540-550. doi: 10.1093/bib/bby087.
6
OpenCOR: a modular and interoperable approach to computational biology.OpenCOR:一种用于计算生物学的模块化和可互操作方法。
Front Physiol. 2015 Feb 6;6:26. doi: 10.3389/fphys.2015.00026. eCollection 2015.
7
BioModels: ten-year anniversary.生物模型:十周年纪念。
Nucleic Acids Res. 2015 Jan;43(Database issue):D542-8. doi: 10.1093/nar/gku1181. Epub 2014 Nov 20.
8
The Physiome Model Repository 2.生理模型库 2.
Bioinformatics. 2011 Mar 1;27(5):743-4. doi: 10.1093/bioinformatics/btq723. Epub 2011 Jan 6.
9
Multiple ontologies in action: composite annotations for biosimulation models.多种本体在行动:生物模拟模型的组合注释。
J Biomed Inform. 2011 Feb;44(1):146-54. doi: 10.1016/j.jbi.2010.06.007. Epub 2010 Jun 30.
10
An integrative dynamic model of brain energy metabolism using in vivo neurochemical measurements.一种利用体内神经化学测量的脑能量代谢综合动态模型。
J Comput Neurosci. 2009 Dec;27(3):391-414. doi: 10.1007/s10827-009-0152-8. Epub 2009 Apr 25.