Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
Department of Computer Science, Vanderbilt University, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
J Biomed Inform. 2018 Jul;83:63-72. doi: 10.1016/j.jbi.2018.05.014. Epub 2018 May 22.
Word embeddings project semantically similar terms into nearby points in a vector space. When trained on clinical text, these embeddings can be leveraged to improve keyword search and text highlighting. In this paper, we present methods to refine the selection process of similar terms from multiple EMR-based word embeddings, and evaluate their performance quantitatively and qualitatively across multiple chart review tasks.
Word embeddings were trained on each clinical note type in an EMR. These embeddings were then combined, weighted, and truncated to select a refined set of similar terms to be used in keyword search and text highlighting. To evaluate their quality, we measured the similar terms' information retrieval (IR) performance using precision-at-K (P@5, P@10). Additionally a user study evaluated users' search term preferences, while a timing study measured the time to answer a question from a clinical chart.
The refined terms outperformed the baseline method's information retrieval performance (e.g., increasing the average P@5 from 0.48 to 0.60). Additionally, the refined terms were preferred by most users, and reduced the average time to answer a question.
Clinical information can be more quickly retrieved and synthesized when using semantically similar term from multiple embeddings.
目的:词向量将语义相似的术语映射到向量空间中的邻近点。当在临床文本上进行训练时,这些嵌入可以被利用来改进关键词搜索和文本突出显示。在本文中,我们提出了从多个基于 EMR 的词嵌入中精炼相似术语选择过程的方法,并在多个图表审查任务中对其进行了定量和定性评估。
材料和方法:在 EMR 中的每种临床记录类型上训练词嵌入。然后将这些嵌入组合、加权和截断,以选择一组经过精炼的相似术语,用于关键词搜索和文本突出显示。为了评估它们的质量,我们使用精度-at-K(P@5、P@10)来衡量相似术语的信息检索(IR)性能。此外,用户研究评估了用户的搜索词偏好,而时间研究则衡量了从临床图表回答问题的时间。
结果:精炼后的术语提高了信息检索性能(例如,平均 P@5 从 0.48 提高到 0.60)。此外,大多数用户更喜欢精炼后的术语,并减少了回答问题的平均时间。
结论:当使用来自多个嵌入的语义相似术语时,可以更快地检索和综合临床信息。