BENTO：一个基于CodaLab构建临床自然语言处理管道的可视化平台。

BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab.

作者信息

Jin Yonghao, Li Fei, Yu Hong

机构信息

Department of Computer Science, University of Massachusetts Lowell, MA, USA.

出版信息

Proc Conf Assoc Comput Linguist Meet. 2020 Jul;2020:95-100. doi: 10.18653/v1/2020.acl-demos.13.

DOI:10.18653/v1/2020.acl-demos.13

PMID:33223604

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7679080/

Abstract

CodaLab is an open-source web-based platform for collaborative computational research. Although CodaLab has gained popularity in the research community, its interface has limited support for creating reusable tools that can be easily applied to new datasets and composed into pipelines. In clinical domain, natural language processing (NLP) on medical notes generally involves multiple steps, like tokenization, named entity recognition, etc. Since these steps require different tools which are usually scattered in different publications, it is not easy for researchers to use them to process their own datasets. In this paper, we present , a workflow management platform with a graphic user interface (GUI) that is built on top of CodaLab, to facilitate the process of building clinical NLP pipelines. BENTO comes with a number of clinical NLP tools that have been pre-trained using medical notes and expert annotations and can be readily used for various clinical NLP tasks. It also allows researchers and developers to create their custom tools (e.g., pre-trained NLP models) and use them in a controlled and reproducible way. In addition, the GUI interface enables researchers with limited computer background to compose tools into NLP pipelines and then apply the pipelines on their own datasets in a "what you see is what you get" (WYSIWYG) way. Although BENTO is designed for clinical NLP applications, the underlying architecture is flexible to be tailored to any other domains.

摘要

CodaLab是一个基于网络的开源协作计算研究平台。尽管CodaLab在研究社区中颇受欢迎，但其界面在创建可轻松应用于新数据集并组合成管道的可重复使用工具方面支持有限。在临床领域，对医学笔记进行自然语言处理（NLP）通常涉及多个步骤，如词法分析、命名实体识别等。由于这些步骤需要不同的工具，而这些工具通常分散在不同的出版物中，研究人员使用它们来处理自己的数据集并不容易。在本文中，我们介绍了BENTO，这是一个基于CodaLab构建的具有图形用户界面（GUI）的工作流管理平台，以促进临床NLP管道的构建过程。BENTO附带了许多已使用医学笔记和专家注释进行预训练的临床NLP工具，可随时用于各种临床NLP任务。它还允许研究人员和开发人员创建自己的自定义工具（例如，预训练的NLP模型），并以可控且可重复的方式使用它们。此外，GUI界面使计算机背景有限的研究人员能够将工具组合成NLP管道，然后以“所见即所得”（WYSIWYG）的方式将管道应用于自己的数据集。尽管BENTO是为临床NLP应用而设计的，但其底层架构具有灵活性，可针对任何其他领域进行定制。