Department of Computer Science, Aalto University, Espoo, Finland.
Department of Nursing Science, University of Turku, Turku, Finland.
Stud Health Technol Inform. 2022 May 25;294:854-858. doi: 10.3233/SHTI220606.
In health sciences, high-quality text embeddings may augment qualitative data analysis of large amounts of text by enabling, e.g., searching and clustering of health information. This study aimed to evaluate three different sentence-level embedding methods in clustering sentences in nursing narratives from individual patients' hospital care episodes. Two of these embeddings are generated from language models based on the BERT framework, and the third on the Sent2Vec method. These embedding methods were used to cluster sentences from 20 patient care episodes and the results were manually evaluated. Findings suggest that the best clusters were produced by the embeddings from a BERT model fine-tuned for the proxy task of predicting subject headings for nursing text.
在健康科学领域,高质量的文本嵌入可以通过例如搜索和聚类健康信息来增强对大量文本的定性数据分析。本研究旨在评估三种不同的句子级嵌入方法在聚类来自个体患者住院护理事件的护理叙述中的句子。其中两个嵌入是基于 BERT 框架的语言模型生成的,第三个是基于 Sent2Vec 方法生成的。这些嵌入方法用于聚类来自 20 个患者护理事件的句子,然后手动评估结果。研究结果表明,通过微调 BERT 模型来预测护理文本的主题标题的代理任务的嵌入方法产生了最佳的聚类。