• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

为多站点意大利语医学自然语言数据项目定义预处理管道。

Defining a Preprocessing Pipeline for the MULTI-SITA Project and General Medical Italian Natural Language Data.

作者信息

Cappello Alice, Mora Sara, Giacobbe Daniele Roberto, Bassetti Matteo, Giacomini Mauro

机构信息

Clinica Malattie Infettive, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.

Department of Informatics, Bioengineering, Robotics and System Engineering, University of Genoa, Genoa, Italy.

出版信息

Stud Health Technol Inform. 2023 Oct 20;309:48-52. doi: 10.3233/SHTI230737.

DOI:10.3233/SHTI230737
PMID:37869804
Abstract

The application of Natural Language Processing (NLP) to medical data has revolutionized different aspects of health care. The benefits obtained from the implementation of this technique spill over into several areas, including in the implementation of chatbots, which can provide medical assistance remotely. Every possible application of NLP depends on one first main step: the pre-processing of the corpus retrieved. The raw data must be prepared with the aim to be used efficiently for further analysis. Considerable progress has been made in this direction for the English language but for other languages, such as Italian, the state of the art is not equivalently advanced, especially for texts containing technical medical terms. The aim of this work is to identify and develop a preprocessing pipeline suitable for medical data written in Italian. The pipeline has been developed in Python environment, employing Enchant, ntlk modules and Hugging Face's BERT and BART-based models. Then, it has been tested on real conversations typed between patients and physicians regarding medical questions. The algorithm has been developed within the MULTI-SITA project of the Italian Society of Anti-Infective Therapy (SITA), but shows a flexible structure that can adapt to a large variety of data.

摘要

自然语言处理(NLP)在医学数据中的应用彻底改变了医疗保健的各个方面。实施这项技术所带来的好处延伸到多个领域,包括聊天机器人的应用,其可以远程提供医疗帮助。NLP的每个可能应用都依赖于一个首要的主要步骤:对检索到的语料库进行预处理。必须对原始数据进行准备,以便有效地用于进一步分析。在英语方面,已经在这个方向上取得了相当大的进展,但对于其他语言,如意大利语,目前的技术水平并没有同等程度的进步,特别是对于包含医学技术术语的文本。这项工作的目的是识别并开发一个适用于意大利语撰写的医学数据的预处理管道。该管道是在Python环境中开发的,使用了Enchant、ntlk模块以及基于Hugging Face的BERT和BART的模型。然后,它在患者与医生之间关于医疗问题的真实对话记录上进行了测试。该算法是在意大利抗感染治疗协会(SITA)的MULTI-SITA项目中开发的,但显示出一种可以适应多种数据的灵活结构。

相似文献

1
Defining a Preprocessing Pipeline for the MULTI-SITA Project and General Medical Italian Natural Language Data.为多站点意大利语医学自然语言数据项目定义预处理管道。
Stud Health Technol Inform. 2023 Oct 20;309:48-52. doi: 10.3233/SHTI230737.
2
Use of "off-the-shelf" information extraction algorithms in clinical informatics: A feasibility study of MetaMap annotation of Italian medical notes.临床信息学中“现成可用”信息提取算法的应用:意大利医学记录的MetaMap注释可行性研究。
J Biomed Inform. 2016 Oct;63:22-32. doi: 10.1016/j.jbi.2016.07.017. Epub 2016 Jul 18.
3
Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach.基于自然语言处理技术的意大利病理报告中癌症形态的自动分类:一种基于规则的方法。
J Biomed Inform. 2021 Apr;116:103712. doi: 10.1016/j.jbi.2021.103712. Epub 2021 Feb 18.
4
Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.老年人日常对话中的社会怀旧:使用自然语言处理和机器学习的自动检测。
J Med Internet Res. 2020 Sep 15;22(9):e19133. doi: 10.2196/19133.
5
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道,使用自然语言处理提取和标准化非结构化临床数据。
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.
6
From narrative descriptions to MedDRA: automagically encoding adverse drug reactions.从叙述性描述到 MedDRA:自动编码药物不良反应。
J Biomed Inform. 2018 Aug;84:184-199. doi: 10.1016/j.jbi.2018.07.001. Epub 2018 Jul 4.
7
A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study.从自由文本肿瘤病理学报告(CancerBERT 网络)中提取数据的问答系统:开发研究。
J Med Internet Res. 2022 Mar 23;24(3):e27210. doi: 10.2196/27210.
8
Development of a generalizable natural language processing pipeline to extract physician-reported pain from clinical reports: Generated using publicly-available datasets and tested on institutional clinical reports for cancer patients with bone metastases.开发一种可推广的自然语言处理管道,从临床报告中提取医生报告的疼痛:使用公开可用的数据集生成,并在患有骨转移的癌症患者的机构临床报告上进行测试。
J Biomed Inform. 2021 Aug;120:103864. doi: 10.1016/j.jbi.2021.103864. Epub 2021 Jul 12.
9
An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian.基于 BERT 的 Twitter 情感分析有效流水线:意大利语案例研究。
Sensors (Basel). 2020 Dec 28;21(1):133. doi: 10.3390/s21010133.
10
Information Extraction from Medical Texts with BERT Using Human-in-the-Loop Labeling.基于人机交互标注的 BERT 在医学文本信息抽取中的应用。
Stud Health Technol Inform. 2023 May 18;302:831-832. doi: 10.3233/SHTI230281.

引用本文的文献

1
Year 2023 in Biomedical Natural Language Processing: a Tribute to Large Language Models and Generative AI.2023年生物医学自然语言处理领域:向大语言模型和生成式人工智能致敬。
Yearb Med Inform. 2024 Aug;33(1):241-248. doi: 10.1055/s-0044-1800751. Epub 2025 Apr 8.