临床叙述中相对和不完整时间表达的规范化。

Normalization of relative and incomplete temporal expressions in clinical narratives.

作者信息

Sun Weiyi, Rumshisky Anna, Uzuner Ozlem

机构信息

Department of Informatics, University at Albany, SUNY. Albany, NY

Department of Computer Science, University of Massachusetts Lowell. Lowell, MA.

出版信息

J Am Med Inform Assoc. 2015 Sep;22(5):1001-8. doi: 10.1093/jamia/ocu004. Epub 2015 Apr 12.

DOI:10.1093/jamia/ocu004

PMID:25868462

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4986666/

Abstract

OBJECTIVE

To improve the normalization of relative and incomplete temporal expressions (RI-TIMEXes) in clinical narratives.

METHODS

We analyzed the RI-TIMEXes in temporally annotated corpora and propose two hypotheses regarding the normalization of RI-TIMEXes in the clinical narrative domain: the anchor point hypothesis and the anchor relation hypothesis. We annotated the RI-TIMEXes in three corpora to study the characteristics of RI-TMEXes in different domains. This informed the design of our RI-TIMEX normalization system for the clinical domain, which consists of an anchor point classifier, an anchor relation classifier, and a rule-based RI-TIMEX text span parser. We experimented with different feature sets and performed an error analysis for each system component.

RESULTS

The annotation confirmed the hypotheses that we can simplify the RI-TIMEXes normalization task using two multi-label classifiers. Our system achieves anchor point classification, anchor relation classification, and rule-based parsing accuracy of 74.68%, 87.71%, and 57.2% (82.09% under relaxed matching criteria), respectively, on the held-out test set of the 2012 i2b2 temporal relation challenge.

DISCUSSION

Experiments with feature sets reveal some interesting findings, such as: the verbal tense feature does not inform the anchor relation classification in clinical narratives as much as the tokens near the RI-TIMEX. Error analysis showed that underrepresented anchor point and anchor relation classes are difficult to detect.

CONCLUSIONS

We formulate the RI-TIMEX normalization problem as a pair of multi-label classification problems. Considering only RI-TIMEX extraction and normalization, the system achieves statistically significant improvement over the RI-TIMEX results of the best systems in the 2012 i2b2 challenge.

摘要

目的

提高临床叙述中相对和不完整时间表达（RI-TIMEXes）的规范化程度。

方法

我们分析了带有时间标注的语料库中的RI-TIMEXes，并提出了关于临床叙述领域中RI-TIMEXes规范化的两个假设：锚点假设和锚点关系假设。我们在三个语料库中对RI-TIMEXes进行了标注，以研究不同领域中RI-TIMEXes的特征。这为我们针对临床领域的RI-TIMEX规范化系统的设计提供了依据，该系统由一个锚点分类器、一个锚点关系分类器和一个基于规则的RI-TIMEX文本跨度解析器组成。我们对不同的特征集进行了实验，并对每个系统组件进行了错误分析。

结果

标注证实了我们可以使用两个多标签分类器简化RI-TIMEXes规范化任务的假设。在2012年i2b2时间关系挑战赛的留出测试集上，我们的系统分别实现了74.68%、87.71%和57.2%（在宽松匹配标准下为82.09%）的锚点分类、锚点关系分类和基于规则的解析准确率。

讨论

对特征集的实验揭示了一些有趣的发现，例如：动词时态特征在临床叙述中对锚点关系分类的作用不如RI-TIMEX附近的词元。错误分析表明，代表性不足的锚点和锚点关系类别难以检测。

结论

我们将RI-TIMEX规范化问题表述为一对多标签分类问题。仅考虑RI-TIMEX提取和规范化，该系统相对于2012年i2b2挑战赛中最佳系统的RI-TIMEX结果有统计学上的显著改进。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

临床叙述中相对和不完整时间表达的规范化。

Normalization of relative and incomplete temporal expressions in clinical narratives.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

DISCUSSION

CONCLUSIONS

目的

方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

临床叙述中相对和不完整时间表达的规范化。

Normalization of relative and incomplete temporal expressions in clinical narratives.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

DISCUSSION

CONCLUSIONS

目的

方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献