Zhou Li, Melton Genevieve B, Parsons Simon, Hripcsak George
Department of Biomedical Informatics, Columbia University, 622 West 168th Street, VC5, New York, NY 10032, USA.
J Biomed Inform. 2006 Aug;39(4):424-39. doi: 10.1016/j.jbi.2005.07.002. Epub 2005 Aug 29.
Time is an essential element in medical data and knowledge which is intrinsically connected with medical reasoning tasks. Many temporal reasoning mechanisms use constraint-based approaches. Our previous research demonstrates that electronic discharge summaries can be modeled as a simple temporal problem (STP).
To categorize temporal expressions in clinical narrative text and to propose and evaluate a temporal constraint structure designed to model this temporal information and to support the implementation of higher-level temporal reasoning.
A corpus of 200 random discharge summaries across 18 years was applied in a grounded approach to construct a representation structure. Then, a subset of 100 discharge summaries was used to tally the frequency of each identified time category and the percentage of temporal expressions modeled by the structure. Fifty random expressions were used to assess inter-coder agreement.
Six main categories of temporal expressions were identified. The constructed temporal constraint structure models time over which an event occurs by constraining its starting time and ending time. It includes a set of fields for the endpoint(s) of an event, anchor information, qualitative and metric temporal relations, and vagueness. In 100 discharge summaries, 1961 of 2022 (97%) identified temporal expressions were effectively modeled using the temporal constraint structure. Inter-coder evaluation of 50 expressions yielded exact match in 90%, partial match with trivial differences in 8%, partial match with large differences in 2%, and total mismatch in 0%.
The proposed temporal constraint structure embodies a sufficient and successful implementation method to encode the diversity of temporal information in discharge summaries. Placing data within the structure provides a foundational representation upon which further reasoning, including the addition of domain knowledge and other post-processing to implement an STP, can be accomplished.
时间是医学数据和知识中的一个重要元素,它与医学推理任务有着内在联系。许多时间推理机制采用基于约束的方法。我们之前的研究表明,电子出院小结可以被建模为一个简单的时间问题(STP)。
对临床叙述文本中的时间表达进行分类,并提出和评估一种时间约束结构,该结构旨在对这种时间信息进行建模,并支持更高级时间推理的实施。
采用扎根理论方法,对18年间的200份随机出院小结语料库进行分析,构建一种表示结构。然后,使用100份出院小结的子集来统计每个已识别时间类别的出现频率以及该结构所建模的时间表达的百分比。使用50个随机表达来评估编码员之间的一致性。
识别出六种主要的时间表达类别。所构建的时间约束结构通过约束事件的开始时间和结束时间来对事件发生的时间进行建模。它包括一组用于事件端点、锚定信息、定性和度量时间关系以及模糊性的字段。在100份出院小结中,2022个已识别的时间表达中有1961个(97%)使用时间约束结构得到了有效建模。对50个表达进行的编码员间评估结果显示,完全匹配的占90%,有微小差异的部分匹配占8%,有较大差异的部分匹配占2%,完全不匹配的占0%。
所提出的时间约束结构体现了一种充分且成功的实现方法,用于对出院小结中时间信息的多样性进行编码。将数据置于该结构中提供了一个基础表示,在此基础上可以完成进一步的推理,包括添加领域知识和其他后处理以实现一个简单时间问题。