• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TAX-Corpus:用于结肠镜检查评估的基于分类法的注释

TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation.

作者信息

Syed Shorabuddin, Angel Adam Jackson, Syeda Hafsa Bareen, Jennings Carole Franc, VanScoy Joseph, Syed Mahanazuddin, Greer Melody, Bhattacharyya Sudeepa, Al-Shukri Shaymaa, Zozus Meredith, Prior Fred, Tharian Benjamin

机构信息

Department of Biomedical Informatics, University of Arkansas for Medical Sciences, U.S.A.

Department of Internal Medicine, Washington University, U.S.A.

出版信息

Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb;2022:162-169. doi: 10.5220/0010876100003123.

DOI:10.5220/0010876100003123
PMID:35300321
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8926426/
Abstract

Colonoscopy plays a critical role in screening of colorectal carcinomas (CC). Unfortunately, the data related to this procedure are stored in disparate documents, colonoscopy, pathology, and radiology reports respectively. The lack of integrated standardized documentation is impeding accurate reporting of quality metrics and clinical and translational research. Natural language processing (NLP) has been used as an alternative to manual data abstraction. Performance of Machine Learning (ML) based NLP solutions is heavily dependent on the accuracy of annotated corpora. Availability of large volume annotated corpora is limited due to data privacy laws and the cost and effort required. In addition, the manual annotation process is error-prone, making the lack of quality annotated corpora the largest bottleneck in deploying ML solutions. The objective of this study is to identify clinical entities critical to colonoscopy quality, and build a high-quality annotated corpus using domain specific taxonomies following standardized annotation guidelines. The annotated corpus can be used to train ML models for a variety of downstream tasks.

摘要

结肠镜检查在结直肠癌(CC)筛查中起着关键作用。不幸的是,与该检查相关的数据分别存储在不同的文档中,即结肠镜检查报告、病理报告和放射学报告。缺乏集成的标准化文档阻碍了质量指标的准确报告以及临床和转化研究。自然语言处理(NLP)已被用作手动数据提取的替代方法。基于机器学习(ML)的NLP解决方案的性能在很大程度上取决于注释语料库的准确性。由于数据隐私法以及所需的成本和精力,大量注释语料库的可用性有限。此外,手动注释过程容易出错,使得缺乏高质量注释语料库成为部署ML解决方案的最大瓶颈。本研究的目的是识别对结肠镜检查质量至关重要的临床实体,并按照标准化注释指南使用特定领域的分类法构建高质量的注释语料库。该注释语料库可用于训练各种下游任务的ML模型。

相似文献

1
TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation.TAX-Corpus:用于结肠镜检查评估的基于分类法的注释
Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb;2022:162-169. doi: 10.5220/0010876100003123.
2
The h-ANN Model: Comprehensive Colonoscopy Concept Compilation Using Combined Contextual Embeddings.h-ANN模型:使用组合上下文嵌入的结肠镜检查综合概念汇编。
Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb;5:189-200. doi: 10.5220/0010903300003123.
3
Consolidated EHR Workflow for Endoscopy Quality Reporting.内镜质量报告的整合电子健康记录工作流程。
Stud Health Technol Inform. 2021 May 27;281:427-431. doi: 10.3233/SHTI210194.
4
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.构建中文临床文本的综合句法和语义语料库。
J Biomed Inform. 2017 May;69:203-217. doi: 10.1016/j.jbi.2017.04.006. Epub 2017 Apr 9.
5
SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks.SemClinBr - 一个用于葡萄牙语临床自然语言处理任务的多机构和多专业的语义注释语料库。
J Biomed Semantics. 2022 May 8;13(1):13. doi: 10.1186/s13326-022-00269-1.
6
Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing.基于Web 2.0的众包方式用于临床自然语言处理中高质量金标准的开发。
J Med Internet Res. 2013 Apr 2;15(4):e73. doi: 10.2196/jmir.2426.
7
Multi-center colonoscopy quality measurement utilizing natural language processing.利用自然语言处理进行多中心结肠镜检查质量评估
Am J Gastroenterol. 2015 Apr;110(4):543-52. doi: 10.1038/ajg.2015.51. Epub 2015 Mar 10.
8
Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text.评估机器预标注和交互式标注界面在临床文本人工去识别化方面的效果。
J Biomed Inform. 2014 Aug;50:162-72. doi: 10.1016/j.jbi.2014.05.002. Epub 2014 May 20.
9
Cross-lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real-MedNLP Workshop.基于有限标注的英文和日文病例/放射学报告的跨语言自然语言处理:来自Real-MedNLP研讨会的见解。
Methods Inf Med. 2024 Oct 29. doi: 10.1055/a-2405-2489.
10
Annotation-preserving machine translation of English corpora to validate Dutch clinical concept extraction tools.利用标注保留的机器翻译将英文语料库翻译为荷兰文,以验证荷兰临床概念提取工具。
J Am Med Inform Assoc. 2024 Aug 1;31(8):1725-1734. doi: 10.1093/jamia/ocae159.

本文引用的文献

1
Consolidated EHR Workflow for Endoscopy Quality Reporting.内镜质量报告的整合电子健康记录工作流程。
Stud Health Technol Inform. 2021 May 27;281:427-431. doi: 10.3233/SHTI210194.
2
Implementing a multilevel intervention to accelerate colorectal cancer screening and follow-up in federally qualified health centers using a stepped wedge design: a study protocol.采用阶梯式楔形设计在联邦合格健康中心实施多层次干预以加速结直肠癌筛查和随访:研究方案。
Implement Sci. 2020 Oct 29;15(1):96. doi: 10.1186/s13012-020-01045-4.
3
Clinical Text Data in Machine Learning: Systematic Review.机器学习中的临床文本数据:系统综述
JMIR Med Inform. 2020 Mar 31;8(3):e17984. doi: 10.2196/17984.
4
An extensive review of tools for manual annotation of documents.对文档手动标注工具的全面回顾。
Brief Bioinform. 2021 Jan 18;22(1):146-163. doi: 10.1093/bib/bbz130.
5
Combining Contextualized Embeddings and Prior Knowledge for Clinical Named Entity Recognition: Evaluation Study.结合上下文嵌入和先验知识进行临床命名实体识别:评估研究
JMIR Med Inform. 2019 Nov 13;7(4):e14850. doi: 10.2196/14850.
6
2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records.2018n2c2 电子健康记录中药物不良反应和药物提取共享任务。
J Am Med Inform Assoc. 2020 Jan 1;27(1):3-12. doi: 10.1093/jamia/ocz166.
7
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
8
Evaluating the Impact of Dictionary Updates on Automatic Annotations Based on Clinical NLP Systems.评估词典更新对基于临床自然语言处理系统的自动标注的影响。
AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:714-721. eCollection 2019.
9
Clinical text annotation - what factors are associated with the cost of time?临床文本注释——与时间成本相关的因素有哪些?
AMIA Annu Symp Proc. 2018 Dec 5;2018:1552-1560. eCollection 2018.
10
Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition.结合事实医学知识与分布式词表示以改进临床命名实体识别。
AMIA Annu Symp Proc. 2018 Dec 5;2018:1110-1117. eCollection 2018.