Mishra Ninad K, Cummo David M, Arnzen James J, Bonander Jason
Centers for Disease Control and Prevention, 1600 Clifton Rd, Mail Stop E76, Atlanta, GA, USA.
J Am Med Inform Assoc. 2009 Jul-Aug;16(4):576-9. doi: 10.1197/jamia.M3086. Epub 2009 Apr 23.
OBJECTIVE Evaluate the effectiveness of a simple rule-based approach in classifying medical discharge summaries according to indicators for obesity and 15 associated co-morbidities as part of the 2008 i2b2 Obesity Challenge. METHODS The authors applied a rule-based approach that looked for occurrences of morbidity-related keywords and identified the types of assertions in which those keywords occurred. The documents were then classified using a simple scoring algorithm based on a mapping of the assertion types to possible judgment categories. MEASUREMENTS RESULTS for the challenge were evaluated based on macro F-measure. We report micro and macro F-measure results for all morbidities combined and for each morbidity separately. Results Our rule-based approach achieved micro and macro F-measures of 0.97 and 0.77, respectively, ranking fifth out of the entries submitted by 28 teams participating in the classification task based on textual judgments and substantially outperforming the average for the challenge. CONCLUSIONS As shown by its ranking in the challenge results, this approach performed relatively well under conditions in which limited training data existed for some judgment categories. Further, the approach held up well in relation to more complex approaches applied to this classification task. The approach could be enhanced by the addition of expert rules to model more complex medical reasoning.
目的 作为2008年i2b2肥胖症挑战赛的一部分,评估一种基于简单规则的方法在根据肥胖症及15种相关合并症指标对出院小结进行分类方面的有效性。方法 作者应用了一种基于规则的方法,该方法查找与发病率相关的关键词出现情况,并确定这些关键词出现的断言类型。然后,根据断言类型到可能判断类别的映射,使用简单的评分算法对文档进行分类。基于宏F值评估挑战赛的测量结果。我们报告了所有合并症综合以及每种合并症单独的微观和宏观F值结果。结果 我们基于规则的方法分别实现了0.97和0.77的微观和宏观F值,在参与基于文本判断的分类任务的28个团队提交的条目中排名第五,并且显著优于挑战赛的平均水平。结论 正如其在挑战赛结果中的排名所示,该方法在某些判断类别训练数据有限的情况下表现相对较好。此外,与应用于该分类任务的更复杂方法相比,该方法表现良好。通过添加专家规则以模拟更复杂的医学推理,该方法可以得到改进。