Solt Illés, Tikk Domonkos, Gál Viktor, Kardkovács Zsolt T
Department of Media Informatics and Telematics, Budapest University of Technology and Economics, Budapest, Hungary.
J Am Med Inform Assoc. 2009 Jul-Aug;16(4):580-4. doi: 10.1197/jamia.M3087. Epub 2009 Apr 23.
OBJECTIVE Automated and disease-specific classification of textual clinical discharge summaries is of great importance in human life science, as it helps physicians to make medical studies by providing statistically relevant data for analysis. This can be further facilitated if, at the labeling of discharge summaries, semantic labels are also extracted from text, such as whether a given disease is present, absent, questionable in a patient, or is unmentioned in the document. The authors present a classification technique that successfully solves the semantic classification task. DESIGN The authors introduce a context-aware rule-based semantic classification technique for use on clinical discharge summaries. The classification is performed in subsequent steps. First, some misleading parts are removed from the text; then the text is partitioned into positive, negative, and uncertain context segments, then a sequence of binary classifiers is applied to assign the appropriate semantic labels. Measurement For evaluation the authors used the documents of the i2b2 Obesity Challenge and adopted its evaluation measures: F(1)-macro and F(1)-micro for measurements. RESULTS On the two subtasks of the Obesity Challenge (textual and intuitive classification) the system performed very well, and achieved a F(1)-macro = 0.80 for the textual and F(1)-macro = 0.67 for the intuitive tasks, and obtained second place at the textual and first place at the intuitive subtasks of the challenge. CONCLUSIONS The authors show in the paper that a simple rule-based classifier can tackle the semantic classification task more successfully than machine learning techniques, if the training data are limited and some semantic labels are very sparse.
目的 文本临床出院小结的自动且针对疾病的分类在人类生命科学中极为重要,因为它通过提供具有统计相关性的数据进行分析,帮助医生开展医学研究。如果在出院小结标注时,还能从文本中提取语义标签,比如患者是否患有某种特定疾病、未患、情况存疑或文档中未提及,这将进一步推动相关工作。作者提出了一种成功解决语义分类任务的分类技术。
设计 作者引入了一种基于上下文感知规则的语义分类技术,用于临床出院小结。分类分后续几个步骤进行。首先,从文本中去除一些误导性部分;然后将文本划分为正、负和不确定上下文片段,接着应用一系列二元分类器来分配适当的语义标签。
测量 为进行评估,作者使用了i2b2肥胖挑战的文档并采用其评估指标:用于测量的F(1)-宏和F(1)-微。
结果 在肥胖挑战的两个子任务(文本分类和直观分类)上,该系统表现出色,在文本分类任务中F(1)-宏 = 0.80,在直观任务中F(1)-宏 = 0.67,在挑战的文本子任务中获得第二名,在直观子任务中获得第一名。
结论 作者在论文中表明,如果训练数据有限且一些语义标签非常稀疏,那么一个简单的基于规则的分类器比机器学习技术能更成功地处理语义分类任务。