Suppr超能文献

使用Transformer模型预测临床句子对之间的语义相似性:评估与表征分析

Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis.

作者信息

Ormerod Mark, Martínez Del Rincón Jesús, Devereux Barry

机构信息

Institute of Electronics, Communications & Information Technology, School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, United Kingdom.

出版信息

JMIR Med Inform. 2021 May 26;9(5):e23099. doi: 10.2196/23099.

Abstract

BACKGROUND

Semantic textual similarity (STS) is a natural language processing (NLP) task that involves assigning a similarity score to 2 snippets of text based on their meaning. This task is particularly difficult in the domain of clinical text, which often features specialized language and the frequent use of abbreviations.

OBJECTIVE

We created an NLP system to predict similarity scores for sentence pairs as part of the Clinical Semantic Textual Similarity track in the 2019 n2c2/OHNLP Shared Task on Challenges in Natural Language Processing for Clinical Data. We subsequently sought to analyze the intermediary token vectors extracted from our models while processing a pair of clinical sentences to identify where and how representations of semantic similarity are built in transformer models.

METHODS

Given a clinical sentence pair, we take the average predicted similarity score across several independently fine-tuned transformers. In our model analysis we investigated the relationship between the final model's loss and surface features of the sentence pairs and assessed the decodability and representational similarity of the token vectors generated by each model.

RESULTS

Our model achieved a correlation of 0.87 with the ground-truth similarity score, reaching 6th place out of 33 teams (with a first-place score of 0.90). In detailed qualitative and quantitative analyses of the model's loss, we identified the system's failure to correctly model semantic similarity when both sentence pairs contain details of medical prescriptions, as well as its general tendency to overpredict semantic similarity given significant token overlap. The token vector analysis revealed divergent representational strategies for predicting textual similarity between bidirectional encoder representations from transformers (BERT)-style models and XLNet. We also found that a large amount information relevant to predicting STS can be captured using a combination of a classification token and the cosine distance between sentence-pair representations in the first layer of a transformer model that did not produce the best predictions on the test set.

CONCLUSIONS

We designed and trained a system that uses state-of-the-art NLP models to achieve very competitive results on a new clinical STS data set. As our approach uses no hand-crafted rules, it serves as a strong deep learning baseline for this task. Our key contribution is a detailed analysis of the model's outputs and an investigation of the heuristic biases learned by transformer models. We suggest future improvements based on these findings. In our representational analysis we explore how different transformer models converge or diverge in their representation of semantic signals as the tokens of the sentences are augmented by successive layers. This analysis sheds light on how these "black box" models integrate semantic similarity information in intermediate layers, and points to new research directions in model distillation and sentence embedding extraction for applications in clinical NLP.

摘要

背景

语义文本相似性(STS)是一种自然语言处理(NLP)任务,涉及根据两段文本的含义为其分配相似性分数。在临床文本领域,这项任务尤其困难,因为临床文本常常具有专业语言且频繁使用缩写。

目的

我们创建了一个NLP系统,用于预测句子对的相似性分数,作为2019年n2c2/OHNLP临床数据自然语言处理挑战共享任务中临床语义文本相似性赛道的一部分。随后,我们试图分析在处理一对临床句子时从我们的模型中提取的中间令牌向量,以确定在Transformer模型中语义相似性的表示是在哪里以及如何构建的。

方法

给定一对临床句子,我们取多个独立微调的Transformer模型的平均预测相似性分数。在我们的模型分析中,我们研究了最终模型的损失与句子对的表面特征之间的关系,并评估了每个模型生成的令牌向量的可解码性和表示相似性。

结果

我们的模型与真实相似性分数的相关性达到0.87,在33个团队中排名第6(第一名分数为0.90)。在对模型损失的详细定性和定量分析中,我们发现当两个句子对都包含医疗处方细节时,系统无法正确建模语义相似性,以及在存在大量令牌重叠时,系统普遍倾向于过度预测语义相似性。令牌向量分析揭示了Transformer双向编码器表示(BERT)风格模型和XLNet在预测文本相似性时不同的表示策略。我们还发现,使用分类令牌和Transformer模型第一层中句子对表示之间的余弦距离的组合,可以捕获大量与预测STS相关的信息,而该模型在测试集上并未产生最佳预测。

结论

我们设计并训练了一个系统,该系统使用最先进的NLP模型在新的临床STS数据集上取得了极具竞争力的结果。由于我们的方法不使用手工制作的规则,它为此任务提供了一个强大的深度学习基线。我们的关键贡献是对模型输出进行了详细分析,并研究了Transformer模型学到的启发式偏差。我们根据这些发现提出了未来的改进建议。在我们的表示分析中,我们探索了随着句子的令牌被连续层增强,不同的Transformer模型在语义信号表示上如何收敛或发散。这一分析揭示了这些“黑箱”模型如何在中间层整合语义相似性信息,并为临床NLP应用中的模型蒸馏和句子嵌入提取指明了新的研究方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a8/8190645/7e875db34f59/medinform_v9i5e23099_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验