Suppr超能文献

使用多标签结构化支持向量机识别临床文本中的病症

Disorder recognition in clinical texts using multi-label structured SVM.

作者信息

Lin Wutao, Ji Donghong, Lu Yanan

机构信息

School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China.

School of Computer, Wuhan University, Wuhan, 430072, China.

出版信息

BMC Bioinformatics. 2017 Jan 31;18(1):75. doi: 10.1186/s12859-017-1476-4.

Abstract

BACKGROUND

Information extraction in clinical texts enables medical workers to find out problems of patients faster as well as makes intelligent diagnosis possible in the future. There has been a lot of work about disorder mention recognition in clinical narratives. But recognition of some more complicated disorder mentions like overlapping ones is still an open issue. This paper proposes a multi-label structured Support Vector Machine (SVM) based method for disorder mention recognition. We present a multi-label scheme which could be used in complicated entity recognition tasks.

RESULTS

We performed three sets of experiments to evaluate our model. Our best F-Score on the 2013 Conference and Labs of the Evaluation Forum data set is 0.7343. There are six types of labels in our multi-label scheme, all of which are represented by 24-bit binary numbers. The binary digits of each label contain information about different disorder mentions. Our multi-label method can recognize not only disorder mentions in the form of contiguous or discontiguous words but also mentions whose spans overlap with each other. The experiments indicate that our multi-label structured SVM model outperforms the condition random field (CRF) model for this disorder mention recognition task. The experiments show that our multi-label scheme surpasses the baseline. Especially for overlapping disorder mentions, the F-Score of our multi-label scheme is 0.1428 higher than the baseline BIOHD1234 scheme.

CONCLUSIONS

This multi-label structured SVM based approach is demonstrated to work well with this disorder recognition task. The novel multi-label scheme we presented is superior to the baseline and it can be used in other models to solve various types of complicated entity recognition tasks as well.

摘要

背景

临床文本中的信息提取能够帮助医护人员更快地发现患者的问题,并为未来的智能诊断提供可能。关于临床叙述中疾病提及识别的研究已经有很多。但是,识别一些更复杂的疾病提及,如重叠的疾病提及,仍然是一个未解决的问题。本文提出了一种基于多标签结构化支持向量机(SVM)的疾病提及识别方法。我们提出了一种可用于复杂实体识别任务的多标签方案。

结果

我们进行了三组实验来评估我们的模型。我们在2013年评测论坛会议和实验室数据集上的最佳F值为0.7343。我们的多标签方案中有六种类型的标签,所有标签均由24位二进制数表示。每个标签的二进制数字包含有关不同疾病提及的信息。我们的多标签方法不仅可以识别连续或不连续单词形式的疾病提及,还可以识别跨度相互重叠的提及。实验表明,对于该疾病提及识别任务,我们的多标签结构化支持向量机模型优于条件随机场(CRF)模型。实验表明,我们的多标签方案优于基线。特别是对于重叠的疾病提及,我们的多标签方案的F值比基线BIOHD1234方案高0.1428。

结论

这种基于多标签结构化支持向量机的方法在该疾病识别任务中表现良好。我们提出的新颖多标签方案优于基线,并且它也可用于其他模型以解决各种类型的复杂实体识别任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a28e/5282630/5f091c081209/12859_2017_1476_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验