• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

标记:一种由主动学习驱动的基于网络的注释工具。

Markup: A Web-Based Annotation Tool Powered by Active Learning.

作者信息

Dobbie Samuel, Strafford Huw, Pickrell W Owen, Fonferko-Shadrach Beata, Jones Carys, Akbari Ashley, Thompson Simon, Lacey Arron

机构信息

Health Data Research UK, Swansea University Medical School, Swansea University, Swansea, United Kingdom.

Swansea University Medical School, Swansea University, Swansea, United Kingdom.

出版信息

Front Digit Health. 2021 Jul 26;3:598916. doi: 10.3389/fdgth.2021.598916. eCollection 2021.

DOI:10.3389/fdgth.2021.598916
PMID:34713086
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8521860/
Abstract

Across various domains, such as health and social care, law, news, and social media, there are increasing quantities of unstructured texts being produced. These potential data sources often contain rich information that could be used for domain-specific and research purposes. However, the unstructured nature of free-text data poses a significant challenge for its utilisation due to the necessity of substantial manual intervention from domain-experts to label embedded information. Annotation tools can assist with this process by providing functionality that enables the accurate capture and transformation of unstructured texts into structured annotations, which can be used individually, or as part of larger Natural Language Processing (NLP) pipelines. We present Markup (https://www.getmarkup.com/) an open-source, web-based annotation tool that is undergoing continued development for use across all domains. Markup incorporates NLP and Active Learning (AL) technologies to enable rapid and accurate annotation using custom user configurations, predictive annotation suggestions, and automated mapping suggestions to both domain-specific ontologies, such as the Unified Medical Language System (UMLS), and custom, user-defined ontologies. We demonstrate a real-world use case of how Markup has been used in a healthcare setting to annotate structured information from unstructured clinic letters, where captured annotations were used to build and test NLP applications.

摘要

在健康与社会护理、法律、新闻和社交媒体等各个领域,正在产生越来越多的非结构化文本。这些潜在的数据源通常包含丰富的信息,可用于特定领域和研究目的。然而,自由文本数据的非结构化性质对其利用构成了重大挑战,因为需要领域专家进行大量人工干预来标记嵌入的信息。注释工具可以通过提供相关功能来辅助这一过程,该功能能够将非结构化文本准确地捕获并转换为结构化注释,这些注释既可以单独使用,也可以作为更大的自然语言处理(NLP)管道的一部分。我们展示了Markup(https://www.getmarkup.com/),这是一个基于网络的开源注释工具,目前正在持续开发,以便在所有领域使用。Markup整合了自然语言处理和主动学习(AL)技术,能够使用自定义用户配置、预测性注释建议以及针对特定领域本体(如统一医学语言系统(UMLS))和自定义用户定义本体的自动映射建议,实现快速准确的注释。我们展示了一个实际应用案例,说明Markup如何在医疗环境中用于注释非结构化临床信件中的结构化信息,其中捕获的注释用于构建和测试自然语言处理应用程序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd3d/8521860/71a9452a446e/fdgth-03-598916-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd3d/8521860/375043272825/fdgth-03-598916-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd3d/8521860/adf58de9561f/fdgth-03-598916-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd3d/8521860/53cff31dc040/fdgth-03-598916-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd3d/8521860/31faaa74d3f2/fdgth-03-598916-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd3d/8521860/71a9452a446e/fdgth-03-598916-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd3d/8521860/375043272825/fdgth-03-598916-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd3d/8521860/adf58de9561f/fdgth-03-598916-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd3d/8521860/53cff31dc040/fdgth-03-598916-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd3d/8521860/31faaa74d3f2/fdgth-03-598916-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd3d/8521860/71a9452a446e/fdgth-03-598916-g0005.jpg

相似文献

1
Markup: A Web-Based Annotation Tool Powered by Active Learning.标记:一种由主动学习驱动的基于网络的注释工具。
Front Digit Health. 2021 Jul 26;3:598916. doi: 10.3389/fdgth.2021.598916. eCollection 2021.
2
Annotation of epilepsy clinic letters for natural language processing.癫痫门诊信件的自然语言处理标注。
J Biomed Semantics. 2024 Sep 15;15(1):17. doi: 10.1186/s13326-024-00316-z.
3
Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.使用自然语言处理方法从自由文本和非结构化患者生成的健康数据中提取医学信息:基于真实世界数据的可行性研究
JMIR Form Res. 2023 Mar 7;7:e43014. doi: 10.2196/43014.
4
Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing.基于Web 2.0的众包方式用于临床自然语言处理中高质量金标准的开发。
J Med Internet Res. 2013 Apr 2;15(4):e73. doi: 10.2196/jmir.2426.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
Programming techniques for improving rule readability for rule-based information extraction natural language processing pipelines of unstructured and semi-structured medical texts.用于改进基于规则的信息抽取自然语言处理管道的规则可读性的编程技术,这些管道处理非结构化和半结构化的医学文本。
Health Informatics J. 2023 Apr-Jun;29(2):14604582231164696. doi: 10.1177/14604582231164696.
7
Clinical Text Data in Machine Learning: Systematic Review.机器学习中的临床文本数据:系统综述
JMIR Med Inform. 2020 Mar 31;8(3):e17984. doi: 10.2196/17984.
8
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道,使用自然语言处理提取和标准化非结构化临床数据。
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.
9
Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction.知识作者:促进用户驱动的领域内容开发,以支持临床信息提取。
J Biomed Semantics. 2016 Jun 23;7(1):42. doi: 10.1186/s13326-016-0086-9.
10
A natural language processing pipeline to synthesize patient-generated notes toward improving remote care and chronic disease management: a cystic fibrosis case study.一种用于合成患者生成的笔记以改善远程护理和慢性病管理的自然语言处理管道:囊性纤维化案例研究。
JAMIA Open. 2021 Sep 29;4(3):ooab084. doi: 10.1093/jamiaopen/ooab084. eCollection 2021 Jul.

引用本文的文献

1
Annotation of epilepsy clinic letters for natural language processing.癫痫门诊信件的自然语言处理标注。
J Biomed Semantics. 2024 Sep 15;15(1):17. doi: 10.1186/s13326-024-00316-z.
2
MetaTron: advancing biomedical annotation empowering relation annotation and collaboration.MetaTron:推进生物医学标注,赋能关系标注与协作。
BMC Bioinformatics. 2024 Mar 14;25(1):112. doi: 10.1186/s12859-024-05730-9.
3
Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study.

本文引用的文献

1
TeamTat: a collaborative text annotation tool.TeamTat:一个协作文本注释工具。
Nucleic Acids Res. 2020 Jul 2;48(W1):W5-W11. doi: 10.1093/nar/gkaa333.
2
Towards cross-platform interoperability for machine-assisted text annotation.迈向机器辅助文本标注的跨平台互操作性。
Genomics Inform. 2019 Jun;17(2):e19. doi: 10.5808/GI.2019.17.2.e19. Epub 2019 Jun 26.
3
Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system.
基于人在回路深度学习的电子病历自由文本数据去识别化的网络应用程序:开发与可用性研究
Interact J Med Res. 2023 Aug 25;12:e46322. doi: 10.2196/46322.
4
Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing.使用自然语言处理技术开发并验证一种自动化的基底细胞癌组织病理学信息提取系统。
Front Surg. 2022 Aug 24;9:870494. doi: 10.3389/fsurg.2022.870494. eCollection 2022.
5
MedTAG: a portable and customizable annotation tool for biomedical documents.MedTAG:一个用于生物医学文档的可移植和可定制的注释工具。
BMC Med Inform Decis Mak. 2021 Dec 18;21(1):352. doi: 10.1186/s12911-021-01706-4.
利用自然语言处理从非结构化临床信件中提取结构化癫痫数据:ExECT(癫痫临床文本提取)系统的开发和验证。
BMJ Open. 2019 Apr 1;9(4):e023232. doi: 10.1136/bmjopen-2018-023232.
4
ezTag: tagging biomedical concepts via interactive learning.ezTag:通过交互式学习对生物医学概念进行标记。
Nucleic Acids Res. 2018 Jul 2;46(W1):W523-W529. doi: 10.1093/nar/gky428.
5
Anafora: A Web-based General Purpose Annotation Tool.Anafora:一个基于网络的通用注释工具。
Proc Conf. 2013 Jun;2013:14-19.
6
Marky: a tool supporting annotation consistency in multi-user and iterative document annotation projects.Marky:一种支持多用户和迭代文档注释项目中注释一致性的工具。
Comput Methods Programs Biomed. 2015 Feb;118(2):242-51. doi: 10.1016/j.cmpb.2014.11.005. Epub 2014 Nov 25.
7
Assisted annotation of medical free text using RapTAT.使用 RapTAT 辅助医学自由文本的注释。
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):833-41. doi: 10.1136/amiajnl-2013-002255. Epub 2014 Jan 15.
8
A survey on annotation tools for the biomedical literature.一份关于生物医学文献注释工具的调查。
Brief Bioinform. 2014 Mar;15(2):327-40. doi: 10.1093/bib/bbs084. Epub 2012 Dec 18.
9
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.