Chen Timothy L, Emerling Max, Chaudhari Gunvant R, Chillakuru Yeshwant R, Seo Youngho, Vu Thienkhai H, Sohn Jae Ho
University of California San Francisco (UCSF), Radiology and Biomedical Imaging, 505 Parnassus Ave, San Francisco, CA 94143, USA; University of Illinois College of Medicine, 1853 W Polk St, Chicago, IL 60612, USA.
University of California San Francisco (UCSF), Radiology and Biomedical Imaging, 505 Parnassus Ave, San Francisco, CA 94143, USA; University of California Berkeley, 2626 Hearst Ave, Berkeley, CA 94720, USA.
J Biomed Inform. 2021 Jan;113:103665. doi: 10.1016/j.jbi.2020.103665. Epub 2020 Dec 15.
There has been increasing interest in machine learning based natural language processing (NLP) methods in radiology; however, models have often used word embeddings trained on general web corpora due to lack of a radiology-specific corpus.
We examined the potential of Radiopaedia to serve as a general radiology corpus to produce radiology specific word embeddings that could be used to enhance performance on a NLP task on radiological text.
Embeddings of dimension 50, 100, 200, and 300 were trained on articles collected from Radiopaedia using a GloVe algorithm and evaluated on analogy completion. A shallow neural network using input from either our trained embeddings or pre-trained Wikipedia 2014 + Gigaword 5 (WG) embeddings was used to label the Radiopaedia articles. Labeling performance was evaluated based on exact match accuracy and Hamming loss. The McNemar's test with continuity and the Benjamini-Hochberg correction and a 5×2 cross validation paired two-tailed t-test were used to assess statistical significance.
For accuracy in the analogy task, 50-dimensional (50-D) Radiopaedia embeddings outperformed WG embeddings on tumor origin analogies (p < 0.05) and organ adjectives (p < 0.01) whereas WG embeddings tended to outperform on inflammation location and bone vs. muscle analogies (p < 0.01). The two embeddings had comparable performance on other subcategories. In the labeling task, the Radiopaedia-based model outperformed the WG based model at 50, 100, 200, and 300-D for exact match accuracy (p < 0.001, p < 0.001, p < 0.01, and p < 0.05, respectively) and Hamming loss (p < 0.001, p < 0.001, p < 0.01, and p < 0.05, respectively).
We have developed a set of word embeddings from Radiopaedia and shown that they can preserve relevant medical semantics and augment performance on a radiology NLP task. Our results suggest that the cultivation of a radiology-specific corpus can benefit radiology NLP models in the future.
放射学领域对基于机器学习的自然语言处理(NLP)方法的兴趣与日俱增;然而,由于缺乏放射学专用语料库,模型通常使用在通用网络语料库上训练的词嵌入。
我们研究了Radiopaedia作为通用放射学语料库的潜力,以生成可用于提高放射学文本NLP任务性能的放射学特定词嵌入。
使用GloVe算法对从Radiopaedia收集的文章训练维度为50、100、200和300的嵌入,并在类比完成任务上进行评估。使用一个浅层神经网络,其输入为我们训练的嵌入或预训练的维基百科2014 + Gigaword 5(WG)嵌入,对Radiopaedia文章进行标注。基于完全匹配准确率和汉明损失评估标注性能。使用带连续性的McNemar检验以及Benjamini-Hochberg校正和5×2交叉验证配对双尾t检验来评估统计显著性。
在类比任务的准确率方面,50维(50-D)的Radiopaedia嵌入在肿瘤起源类比(p < 0.05)和器官形容词类比(p < 0.01)上优于WG嵌入,而WG嵌入在炎症部位以及骨骼与肌肉类比上表现更优(p < 0.01)。两种嵌入在其他子类别上具有可比的性能。在标注任务中,基于Radiopaedia的模型在50、100、200和300维时,在完全匹配准确率(分别为p < 0.001、p < 0.001、p < 0.01和p < 0.05)和汉明损失(分别为p < 0.001、p < 0.001、p < 0.01和p < 0.05)方面均优于基于WG的模型。
我们从Radiopaedia开发了一组词嵌入,并表明它们可以保留相关医学语义并提高放射学NLP任务的性能。我们的结果表明,培养放射学特定语料库未来可能会使放射学NLP模型受益。