US Dept. of Veterans Affairs, Nashville, TN.
Vanderbilt University, Nasvhille, TN.
AMIA Annu Symp Proc. 2022 Feb 21;2021:631-640. eCollection 2021.
Many clinical natural language processing methods rely on non-contextual word embedding (NCWE) or contextual word embedding (CWE) models. Yet, few, if any, intrinsic evaluation benchmarks exist comparing embedding representations against clinician judgment. We developed intrinsic evaluation tasks for embedding models using a corpus of radiology reports: term pair similarity for NCWEs and cloze task accuracy for CWEs. Using surveys, we quantified the agreement between clinician judgment and embedding model representations. We compare embedding models trained on a custom radiology report corpus (RRC), a general corpus, and PubMed and MIMIC-III corpora (P&MC). Cloze task accuracy was equivalent for RRC and P&MC models. For term pair similarity, P&MC-trained NCWEs outperformed all other NCWE models (ρ 0.61 vs. 0.27-0.44). Among models trained on RRC, fastText models often outperformed other NCWE models and spherical embeddings provided overly optimistic representations of term pair similarity.
许多临床自然语言处理方法依赖于非语境词嵌入 (NCWE) 或语境词嵌入 (CWE) 模型。然而,几乎没有内在评估基准可以将嵌入表示与临床医生的判断进行比较。我们使用放射学报告语料库开发了嵌入模型的内在评估任务:NCWE 的术语对相似性和 CWE 的 cloze 任务准确性。我们使用调查量化了临床医生判断和嵌入模型表示之间的一致性。我们比较了在定制放射学报告语料库 (RRC)、一般语料库以及 PubMed 和 MIMIC-III 语料库 (P&MC) 上训练的嵌入模型。RRC 和 P&MC 模型的 cloze 任务准确性相当。对于术语对相似性,P&MC 训练的 NCWE 优于所有其他 NCWE 模型(ρ 0.61 与 0.27-0.44)。在 RRC 上训练的模型中,fastText 模型通常优于其他 NCWE 模型,而球形嵌入提供了过于乐观的术语对相似性表示。