E Oliveira Lucas Emanuel Silva, Gumiel Yohan Bonescki, Dos Santos Arnon Bruno Ventrilho, Cintho Lilian Mie Mukai, Carvalho Deborah Ribeiro, Hasan Sadid A, Moro Claudia Maria Cabral
Health Technology Program, Pontifical Catholic University of Paraná, Curitiba, PR, Brazil.
AI Lab, Philips Research North America, Cambridge, MA, USA.
Stud Health Technol Inform. 2019 Aug 21;264:123-127. doi: 10.3233/SHTI190196.
In this paper, we trained a set of Portuguese clinical word embedding models of different granularities from multi-specialty and multi-institutional clinical narrative datasets. Then, we assessed their impact on a downstream biomedical NLP task of Urinary Tract Infection disease identification. Additionally, we intrinsically evaluated our main model using an adapted version of Bio-SimLex for the Portuguese language. Our empirical results showed that the larger, coarse-grained model achieved a slightly better outcome when compared with the small, fine-grained model in the proposed task. Moreover, we obtained satisfactory results with Bio-SimLex intrinsic evaluation.
在本文中,我们从多专业、多机构的临床叙事数据集中训练了一组不同粒度的葡萄牙语临床词嵌入模型。然后,我们评估了它们对尿路感染疾病识别这一下游生物医学自然语言处理任务的影响。此外,我们使用葡萄牙语版的Bio-SimLex改编版本对我们的主要模型进行了内在评估。我们的实证结果表明,在所提出的任务中,较大的粗粒度模型与较小的细粒度模型相比取得了略好的结果。此外,我们在Bio-SimLex内在评估中获得了令人满意的结果。