Department of Health Administration, Virginia Commonwealth University, Richmond, VA 23298-0203, USA.
J Biomed Inform. 2010 Apr;43(2):218-23. doi: 10.1016/j.jbi.2009.08.016. Epub 2009 Sep 3.
This article describes a formative natural language processing (NLP) system that is grounded in user-centered design, simplification, and transparency of function.
The NLP system was tasked to classify diseases within patient discharge summaries and is evaluated against clinician judgment during the 2008 i2b2 Shared Task competition. Text classification is performed by interactive, fully supervised learning using rule-based processes and support vector machines (SVMs).
The macro-averaged F-score for textual (t) and intuitive (i) classification were 0.614(t) and 0.629(i), while micro-averaged F-scores were recorded at 0.966(t) and 0.954(i) for the competition. These results were comparable to the top 10 performing systems.
The results of this study indicate that an interactive training method, de novo knowledge base with no external data sources, and simplified text mining processes can achieve a comparably high performance in classifying health-related texts. Further research is needed to determine if the user-centered advantages of a NLP system translate into real world benefits.
本文描述了一个基于用户为中心的设计、简化和功能透明性的形成自然语言处理(NLP)系统。
该 NLP 系统的任务是对患者出院总结中的疾病进行分类,并在 2008 年 i2b2 共享任务竞赛中与临床医生的判断进行评估。文本分类是通过交互式、完全监督的学习来完成的,使用基于规则的过程和支持向量机(SVM)。
文本(t)和直观(i)分类的宏平均 F 分数分别为 0.614(t)和 0.629(i),而竞赛中记录的微观平均 F 分数分别为 0.966(t)和 0.954(i)。这些结果与排名前 10 的系统相当。
这项研究的结果表明,交互式培训方法、没有外部数据源的全新知识库和简化的文本挖掘过程可以在分类健康相关文本方面实现相当高的性能。需要进一步研究以确定 NLP 系统的以用户为中心的优势是否转化为实际效益。