Lin Chen, Dligach Dmitriy, Miller Timothy A, Bethard Steven, Savova Guergana K
Boston Children's Hospital Boston, Boston, Massachusetts, USA
Boston Children's Hospital Boston, Boston, Massachusetts, USA Harvard Medical School, Harvard University, Boston, Massachusetts, USA.
J Am Med Inform Assoc. 2016 Mar;23(2):387-95. doi: 10.1093/jamia/ocv113. Epub 2015 Oct 31.
To develop an open-source temporal relation discovery system for the clinical domain. The system is capable of automatically inferring temporal relations between events and time expressions using a multilayered modeling strategy. It can operate at different levels of granularity--from rough temporality expressed as event relations to the document creation time (DCT) to temporal containment to fine-grained classic Allen-style relations.
We evaluated our systems on 2 clinical corpora. One is a subset of the Temporal Histories of Your Medical Events (THYME) corpus, which was used in SemEval 2015 Task 6: Clinical TempEval. The other is the 2012 Informatics for Integrating Biology and the Bedside (i2b2) challenge corpus. We designed multiple supervised machine learning models to compute the DCT relation and within-sentence temporal relations. For the i2b2 data, we also developed models and rule-based methods to recognize cross-sentence temporal relations. We used the official evaluation scripts of both challenges to make our results comparable with results of other participating systems. In addition, we conducted a feature ablation study to find out the contribution of various features to the system's performance.
Our system achieved state-of-the-art performance on the Clinical TempEval corpus and was on par with the best systems on the i2b2 2012 corpus. Particularly, on the Clinical TempEval corpus, our system established a new F1 score benchmark, statistically significant as compared to the baseline and the best participating system.
Presented here is the first open-source clinical temporal relation discovery system. It was built using a multilayered temporal modeling strategy and achieved top performance in 2 major shared tasks.
开发一个用于临床领域的开源时间关系发现系统。该系统能够使用多层建模策略自动推断事件与时间表达式之间的时间关系。它可以在不同粒度级别上运行——从表示为事件关系到文档创建时间(DCT)的粗略时间性,到时间包含关系,再到细粒度的经典艾伦式关系。
我们在两个临床语料库上评估了我们的系统。一个是“你的医疗事件时间史”(THYME)语料库的子集,该子集曾用于2015年语义评价任务6:临床时间评价。另一个是2012年整合生物学与床边信息学(i2b2)挑战赛语料库。我们设计了多个监督机器学习模型来计算DCT关系和句内时间关系。对于i2b2数据,我们还开发了模型和基于规则的方法来识别跨句时间关系。我们使用了这两个挑战赛的官方评估脚本,以使我们的结果与其他参与系统的结果具有可比性。此外,我们进行了特征消融研究,以找出各种特征对系统性能的贡献。
我们的系统在临床时间评价语料库上取得了领先的性能,在2012年i2b2语料库上与最佳系统相当。特别是,在临床时间评价语料库上,我们的系统建立了一个新的F1分数基准,与基线和最佳参与系统相比具有统计学显著性。
本文介绍的是首个开源临床时间关系发现系统。它采用多层时间建模策略构建,在两项主要共享任务中取得了顶级性能。