• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于突变文本挖掘的基准测试基础设施。

Benchmarking infrastructure for mutation text mining.

作者信息

Klein Artjom, Riazanov Alexandre, Hindle Matthew M, Baker Christopher Jo

机构信息

Computer Science And Applied Statistics Department, University of New Brunswick, Saint John, Canada.

出版信息

J Biomed Semantics. 2014 Feb 25;5(1):11. doi: 10.1186/2041-1480-5-11.

DOI:10.1186/2041-1480-5-11
PMID:24568600
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3939821/
Abstract

BACKGROUND

Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems.

RESULTS

We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments.

CONCLUSION

We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.

摘要

背景

由于缺乏用于突变文本挖掘系统测试和基准测试的共识评估基础设施,从文本中自动提取突变信息的实验研究受到极大阻碍。

结果

我们提出了一种面向社区的注释和基准测试基础设施,以支持突变文本挖掘系统的开发、测试、基准测试和比较。该设计基于语义标准,其中RDF用于表示注释,OWL本体为数据提供可扩展的模式,SPARQL用于计算各种性能指标,因此在许多情况下,无需编程即可分析文本挖掘系统的结果。虽然用于生物实体和关系提取的大型基准语料库主要集中在基因、蛋白质、疾病和物种上,但我们的基准测试基础设施填补了突变信息方面的空白。核心基础设施包括:(1)用于对注释进行建模的本体;(2)用于计算性能指标的SPARQL查询;(3)大量经过人工整理的文档集合,可支持突变定位和突变影响提取实验。

结论

我们已经开发了用于突变文本挖掘任务基准测试的主要基础设施。使用RDF和OWL作为语料库的表示方式可确保可扩展性。该基础设施适用于在多个重要场景中开箱即用,并且就其当前状态而言,已准备好供社区初步采用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aea6/3939821/33cacb1b9c5c/2041-1480-5-11-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aea6/3939821/33cacb1b9c5c/2041-1480-5-11-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aea6/3939821/33cacb1b9c5c/2041-1480-5-11-1.jpg

相似文献

1
Benchmarking infrastructure for mutation text mining.用于突变文本挖掘的基准测试基础设施。
J Biomed Semantics. 2014 Feb 25;5(1):11. doi: 10.1186/2041-1480-5-11.
2
Deploying mutation impact text-mining software with the SADI Semantic Web Services framework.部署具有 SADI 语义 Web 服务框架的突变影响文本挖掘软件。
BMC Bioinformatics. 2011;12 Suppl 4(Suppl 4):S6. doi: 10.1186/1471-2105-12-S4-S6. Epub 2011 Jul 5.
3
Algorithms and semantic infrastructure for mutation impact extraction and grounding.突变影响提取和基础的算法和语义基础架构。
BMC Genomics. 2010 Dec 2;11 Suppl 4(Suppl 4):S24. doi: 10.1186/1471-2164-11-S4-S24.
4
Dataset of ontology competency questions to SPARQL-OWL queries translations.从本体能力问题到SPARQL-OWL查询翻译的数据集。
Data Brief. 2020 Jan 7;29:105098. doi: 10.1016/j.dib.2019.105098. eCollection 2020 Apr.
5
Automated extraction and semantic analysis of mutation impacts from the biomedical literature.从生物医学文献中自动提取和语义分析突变影响。
BMC Genomics. 2012 Jun 18;13 Suppl 4(Suppl 4):S10. doi: 10.1186/1471-2164-13-S4-S10.
6
Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm.生物医学实体识别网络服务器的下一代社区评估:BeCalm的指标、性能及互操作性方面
J Cheminform. 2019 Jun 24;11(1):42. doi: 10.1186/s13321-019-0363-6.
7
Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.BioCreative VI 精准医学赛道概述:精准医学中的蛋白质相互作用和突变挖掘。
Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.
8
Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.文本挖掘有助于数据库管理——从生物医学文献中提取突变与疾病的关联。
BMC Bioinformatics. 2015 Jun 6;16:185. doi: 10.1186/s12859-015-0609-x.
9
NLIMED: Natural Language Interface for Model Entity Discovery in Biosimulation Model Repositories.NLIMED:生物模拟模型存储库中模型实体发现的自然语言接口。
Front Physiol. 2022 Feb 24;13:820683. doi: 10.3389/fphys.2022.820683. eCollection 2022.
10
Overview of the BioCreative III Workshop.第三届生物创意研讨会概述。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.

引用本文的文献

1
ResidueFinder: extracting individual residue mentions from protein literature.ResidueFinder:从蛋白质文献中提取单个残基的提及。
J Biomed Semantics. 2021 Jul 21;12(1):14. doi: 10.1186/s13326-021-00243-3.
2
SNPPhenA: a corpus for extracting ranked associations of single-nucleotide polymorphisms and phenotypes from literature.SNPPhenA:一个用于从文献中提取单核苷酸多态性与表型的排序关联的语料库。
J Biomed Semantics. 2017 Apr 7;8(1):14. doi: 10.1186/s13326-017-0116-2.
3
Towards precision medicine: advances in computational approaches for the analysis of human variants.

本文引用的文献

1
The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery.用于生物医学研究和知识发现的语义科学集成本体(SIO)。
J Biomed Semantics. 2014 Mar 6;5(1):14. doi: 10.1186/2041-1480-5-14.
2
Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources.根据基因/蛋白质标记解决方案和词汇资源评估金标准语料库。
J Biomed Semantics. 2013 Oct 11;4(1):28. doi: 10.1186/2041-1480-4-28.
3
BioC: a minimalist approach to interoperability for biomedical text processing.BioC:一种用于生物医学文本处理的最小互操作方法。
迈向精准医学:人类变异分析的计算方法进展。
J Mol Biol. 2013 Nov 1;425(21):4047-63. doi: 10.1016/j.jmb.2013.08.008. Epub 2013 Aug 17.
Database (Oxford). 2013 Sep 18;2013:bat064. doi: 10.1093/database/bat064. Print 2013.
4
Monitoring named entity recognition: the League Table.监测命名实体识别:排行榜。
J Biomed Semantics. 2013 Sep 13;4(1):19. doi: 10.1186/2041-1480-4-19.
5
tmVar: a text mining approach for extracting sequence variants in biomedical literature.tmVar:一种从生物医学文献中提取序列变异的文本挖掘方法。
Bioinformatics. 2013 Jun 1;29(11):1433-9. doi: 10.1093/bioinformatics/btt156. Epub 2013 Apr 5.
6
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.BioCreative 2012 研讨会第三轨道:交互式文本挖掘任务概述。
Database (Oxford). 2013 Jan 17;2013:bas056. doi: 10.1093/database/bas056. Print 2013.
7
Interpretation of the consequences of mutations in protein kinases: combined use of bioinformatics and text mining.解读蛋白激酶突变的后果:生物信息学与文本挖掘的联合应用。
Front Physiol. 2012 Aug 22;3:323. doi: 10.3389/fphys.2012.00323. eCollection 2012.
8
Concept annotation in the CRAFT corpus.概念标注在 CRAFT 语料库中。
BMC Bioinformatics. 2012 Jul 9;13:161. doi: 10.1186/1471-2105-13-161.
9
Automated extraction and semantic analysis of mutation impacts from the biomedical literature.从生物医学文献中自动提取和语义分析突变影响。
BMC Genomics. 2012 Jun 18;13 Suppl 4(Suppl 4):S10. doi: 10.1186/1471-2164-13-S4-S10.
10
Open semantic annotation of scientific publications using DOMEO.使用DOMEO对科学出版物进行开放语义标注。
J Biomed Semantics. 2012 Apr 24;3 Suppl 1(Suppl 1):S1. doi: 10.1186/2041-1480-3-S1-S1.