Singh Ayush, Krishnamoorthy Saranya, Ortega John E
inQbator AI, Evernorth Health Services, Saint Louis, MO USA.
J Healthc Inform Res. 2024 Jan 18;8(2):353-369. doi: 10.1007/s41666-023-00136-3. eCollection 2024 Jun.
One of the common tasks in clinical natural language processing is medical entity linking (MEL) which involves mention detection followed by linking the mention to an entity in a knowledge base. One reason that MEL has not been solved is due to a problem that occurs in language where ambiguous texts can be resolved to several named entities. This problem is exacerbated when processing the text found in electronic health records. Recent work has shown that deep learning models based on transformers outperform previous methods on linking at higher rates of performance. We introduce NeighBERT, a custom pre-training technique which extends BERT (Devlin et al [1]) by encoding how entities are related within a knowledge graph. This technique adds relational context that has been traditionally missing in original BERT, helping resolve the ambiguity found in clinical text. In our experiments, NeighBERT improves the precision, recall, and F1-score of the state of the art by 1-3 points for named entity recognition and 10-15 points for MEL on two widely known clinical datasets.
The online version contains supplementary material available at 10.1007/s41666-023-00136-3.
临床自然语言处理中的常见任务之一是医学实体链接(MEL),它包括提及检测,然后将提及链接到知识库中的一个实体。MEL尚未得到解决的一个原因是语言中存在的一个问题,即模糊文本可以解析为多个命名实体。在处理电子健康记录中的文本时,这个问题会更加严重。最近的工作表明,基于Transformer的深度学习模型在链接方面的性能优于以前的方法。我们引入了NeighBERT,一种定制的预训练技术,它通过编码知识图中实体之间的关系来扩展BERT(Devlin等人[1])。这种技术添加了原始BERT中传统上缺失的关系上下文,有助于解决临床文本中发现的模糊性。在我们的实验中,在两个广为人知的临床数据集上,NeighBERT将命名实体识别的精确率、召回率和F1分数提高了1-3分,将MEL的相应指标提高了10-15分。
在线版本包含可在10.1007/s41666-023-00136-3获取的补充材料。