LIMSI-CNRS, Orsay, France.
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):820-7. doi: 10.1136/amiajnl-2013-001627. Epub 2013 Apr 9.
To identify the temporal relations between clinical events and temporal expressions in clinical reports, as defined in the i2b2/VA 2012 challenge.
To detect clinical events, we used rules and Conditional Random Fields. We built Random Forest models to identify event modality and polarity. To identify temporal expressions we built on the HeidelTime system. To detect temporal relations, we systematically studied their breakdown into distinct situations; we designed an oracle method to determine the most prominent situations and the most suitable associated classifiers, and combined their results.
We achieved F-measures of 0.8307 for event identification, based on rules, and 0.8385 for temporal expression identification. In the temporal relation task, we identified nine main situations in three groups, experimentally confirming shared intuitions: within-sentence relations, section-related time, and across-sentence relations. Logistic regression and Naïve Bayes performed best on the first and third groups, and decision trees on the second. We reached a 0.6231 global F-measure, improving by 7.5 points our official submission.
Carefully hand-crafted rules obtained good results for the detection of events and temporal expressions, while a combination of classifiers improved temporal link prediction. The characterization of the oracle recall of situations allowed us to point at directions where further work would be most useful for temporal relation detection: within-sentence relations and linking History of Present Illness events to the admission date. We suggest that the systematic situation breakdown proposed in this paper could also help improve other systems addressing this task.
确定临床报告中临床事件和时间表达之间的时间关系,如 i2b2/VA 2012 挑战赛所定义的。
为了检测临床事件,我们使用了规则和条件随机场。我们构建了随机森林模型来识别事件模式和极性。为了识别时间表达,我们构建了 HeidelTime 系统。为了检测时间关系,我们系统地研究了它们分解为不同情况的方法;我们设计了一种甲骨文方法来确定最突出的情况和最合适的相关分类器,并结合它们的结果。
我们基于规则实现了事件识别的 F1 度量为 0.8307,基于时间表达识别的 F1 度量为 0.8385。在时间关系任务中,我们在三个组中确定了九个主要情况,通过实验证实了共同的直觉:句子内关系、与部分相关的时间和句子间关系。逻辑回归和朴素贝叶斯在第一组和第三组中表现最好,决策树在第二组中表现最好。我们达到了 0.6231 的全局 F1 度量,比官方提交的成绩提高了 7.5 分。
精心制作的规则在检测事件和时间表达方面取得了良好的效果,而分类器的组合提高了时间链接预测的效果。情况的甲骨文召回率的描述使我们能够指出在时间关系检测方面进一步工作最有用的方向:句子内关系和将现病史事件与入院日期联系起来。我们建议,本文提出的系统情况分解也可以帮助改进其他解决此任务的系统。