使用多标签结构化支持向量机识别临床文本中的病症

Disorder recognition in clinical texts using multi-label structured SVM.

作者信息

Lin Wutao, Ji Donghong, Lu Yanan

机构信息

School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China.

School of Computer, Wuhan University, Wuhan, 430072, China.

出版信息

BMC Bioinformatics. 2017 Jan 31;18(1):75. doi: 10.1186/s12859-017-1476-4.

DOI:10.1186/s12859-017-1476-4

PMID:28143488

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5282630/

Abstract

BACKGROUND

Information extraction in clinical texts enables medical workers to find out problems of patients faster as well as makes intelligent diagnosis possible in the future. There has been a lot of work about disorder mention recognition in clinical narratives. But recognition of some more complicated disorder mentions like overlapping ones is still an open issue. This paper proposes a multi-label structured Support Vector Machine (SVM) based method for disorder mention recognition. We present a multi-label scheme which could be used in complicated entity recognition tasks.

RESULTS

We performed three sets of experiments to evaluate our model. Our best F-Score on the 2013 Conference and Labs of the Evaluation Forum data set is 0.7343. There are six types of labels in our multi-label scheme, all of which are represented by 24-bit binary numbers. The binary digits of each label contain information about different disorder mentions. Our multi-label method can recognize not only disorder mentions in the form of contiguous or discontiguous words but also mentions whose spans overlap with each other. The experiments indicate that our multi-label structured SVM model outperforms the condition random field (CRF) model for this disorder mention recognition task. The experiments show that our multi-label scheme surpasses the baseline. Especially for overlapping disorder mentions, the F-Score of our multi-label scheme is 0.1428 higher than the baseline BIOHD1234 scheme.

CONCLUSIONS

This multi-label structured SVM based approach is demonstrated to work well with this disorder recognition task. The novel multi-label scheme we presented is superior to the baseline and it can be used in other models to solve various types of complicated entity recognition tasks as well.

摘要

背景

临床文本中的信息提取能够帮助医护人员更快地发现患者的问题，并为未来的智能诊断提供可能。关于临床叙述中疾病提及识别的研究已经有很多。但是，识别一些更复杂的疾病提及，如重叠的疾病提及，仍然是一个未解决的问题。本文提出了一种基于多标签结构化支持向量机（SVM）的疾病提及识别方法。我们提出了一种可用于复杂实体识别任务的多标签方案。

结果

我们进行了三组实验来评估我们的模型。我们在2013年评测论坛会议和实验室数据集上的最佳F值为0.7343。我们的多标签方案中有六种类型的标签，所有标签均由24位二进制数表示。每个标签的二进制数字包含有关不同疾病提及的信息。我们的多标签方法不仅可以识别连续或不连续单词形式的疾病提及，还可以识别跨度相互重叠的提及。实验表明，对于该疾病提及识别任务，我们的多标签结构化支持向量机模型优于条件随机场（CRF）模型。实验表明，我们的多标签方案优于基线。特别是对于重叠的疾病提及，我们的多标签方案的F值比基线BIOHD1234方案高0.1428。

结论

这种基于多标签结构化支持向量机的方法在该疾病识别任务中表现良好。我们提出的新颖多标签方案优于基线，并且它也可用于其他模型以解决各种类型的复杂实体识别任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a28e/5282630/5f091c081209/12859_2017_1476_Fig1_HTML.jpg

相似文献

Disorder recognition in clinical texts using multi-label structured SVM.

BMC Bioinformatics. 2017 Jan 31;18(1):75. doi: 10.1186/s12859-017-1476-4.

DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.

BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.

Label Self-Advised Support Vector Machine (LSA-SVM)-Automated Classification of Foot Drop Rehabilitation Case Study.

Biosensors (Basel). 2019 Sep 27;9(4):114. doi: 10.3390/bios9040114.

The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities.

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S6. doi: 10.1186/1471-2105-16-S10-S6. Epub 2015 Jul 13.

Named entity recognition and classification in biomedical text using classifier ensemble.

Int J Data Min Bioinform. 2015;11(4):365-91. doi: 10.1504/ijdmb.2015.067954.

Machine learning-based identification and rule-based normalization of adverse drug reactions in drug labels.

BMC Bioinformatics. 2019 Dec 23;20(Suppl 21):707. doi: 10.1186/s12859-019-3195-5.

Entity recognition from clinical texts via recurrent neural network.

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

Joint Ranking SVM and Binary Relevance with robust Low-rank learning for multi-label classification.

Neural Netw. 2020 Feb;122:24-39. doi: 10.1016/j.neunet.2019.10.002. Epub 2019 Oct 18.

Construction accident narrative classification: An evaluation of text mining techniques.

Accid Anal Prev. 2017 Nov;108:122-130. doi: 10.1016/j.aap.2017.08.026. Epub 2017 Sep 1.

Data-Driven Information Extraction from Chinese Electronic Medical Records.

PLoS One. 2015 Aug 21;10(8):e0136270. doi: 10.1371/journal.pone.0136270. eCollection 2015.

引用本文的文献

ERNIE-UIE: Advancing information extraction in Chinese medical knowledge graph.

PLoS One. 2025 May 29;20(5):e0325082. doi: 10.1371/journal.pone.0325082. eCollection 2025.

Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF.

BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):74. doi: 10.1186/s12911-019-0787-y.

本文引用的文献

Recognizing Disjoint Clinical Concepts in Clinical Text Using Machine Learning-based Methods.

AMIA Annu Symp Proc. 2015 Nov 5;2015:1184-93. eCollection 2015.

CHEMDNER system with mixed conditional random fields and multi-scale word clustering.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S4. doi: 10.1186/1758-2946-7-S1-S4. eCollection 2015.

Drug name recognition in biomedical texts: a machine-learning-based method.

Drug Discov Today. 2014 May;19(5):610-7. doi: 10.1016/j.drudis.2013.10.006. Epub 2013 Oct 16.

Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.

BMC Med Inform Decis Mak. 2013;13 Suppl 1(Suppl 1):S1. doi: 10.1186/1472-6947-13-S1-S1. Epub 2013 Apr 5.

tmVar: a text mining approach for extracting sequence variants in biomedical literature.

Bioinformatics. 2013 Jun 1;29(11):1433-9. doi: 10.1093/bioinformatics/btt156. Epub 2013 Apr 5.

Combined SVM-CRFs for biological named entity recognition with maximal bidirectional squeezing.

PLoS One. 2012;7(6):e39230. doi: 10.1371/journal.pone.0039230. Epub 2012 Jun 26.

A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):601-6. doi: 10.1136/amiajnl-2011-000163. Epub 2011 Apr 20.

Extracting Rx information from clinical narrative.

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):536-9. doi: 10.1136/jamia.2010.003970.

Mining clinical relationships from patient narratives.

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S3. doi: 10.1186/1471-2105-9-S11-S3.

Various criteria in the evaluation of biomedical named entity recognition.

BMC Bioinformatics. 2006 Feb 24;7:92. doi: 10.1186/1471-2105-7-92.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用多标签结构化支持向量机识别临床文本中的病症

Disorder recognition in clinical texts using multi-label structured SVM.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献