School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
Pieces Technologies, Dallas, TX, USA.
BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):49. doi: 10.1186/s12911-018-0627-5.
Most of the current work on clinical temporal relation identification follows the convention developed in the general domain, aiming to identify a comprehensive set of temporal relations from a document including both explicit and implicit relations. While such a comprehensive set can represent temporal information in a document in a complete manner, some of the temporal relations in the comprehensive set may not be essential depending on the clinical application of interest. Moreover, as the types of evidence that should be used to identify explicit and implicit relations are different, current clinical temporal relation identification systems that target both explicit and implicit relations still show low performances for practical use.
In this paper, we propose to focus on a sub-task of conventional temporal relation identification task in order to provide insight into building practical temporal relation identification modules for clinical text. We focus on identification of direct temporal relations, a subset of temporal relations that is chosen to minimize the amount of inference required to identify the relations. A corpus on direct temporal relations between time expressions and event mentions is constructed, and an automatic system tailored for direct temporal relations is developed.
It is shown that the direct temporal relations constitute a major category of temporal relations that contain important information needed for clinical applications. The system optimized for direct temporal relations achieves better performance than the state-of-the-art system developed with comprehensive set of both explicit and implicit relations in mind.
We expect direct temporal relations to facilitate the development of practical temporal information extraction tools in clinical domain.
目前大多数临床医学时间关系识别工作都遵循通用领域中开发的规范,旨在从包含显式和隐式关系的文档中识别出全面的时间关系集。虽然这样一个全面的关系集可以完整地表示文档中的时间信息,但根据感兴趣的临床应用,关系集中的一些时间关系可能不是必需的。此外,由于识别显式和隐式关系应使用的证据类型不同,目前同时针对显式和隐式关系的临床时间关系识别系统在实际使用中仍表现出较低的性能。
在本文中,我们建议专注于传统时间关系识别任务的子任务,以便深入了解为临床文本构建实用的时间关系识别模块。我们专注于直接时间关系的识别,这是一种时间关系的子集,其选择目的是最小化识别关系所需的推理量。构建了一个关于时间表达式和事件提及之间直接时间关系的语料库,并开发了一个针对直接时间关系的自动系统。
结果表明,直接时间关系构成了时间关系的主要类别,其中包含了临床应用所需的重要信息。针对直接时间关系进行优化的系统在性能上优于全面考虑显式和隐式关系的最新系统。
我们期望直接时间关系能够促进临床领域实用时间信息提取工具的开发。