Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA.
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):836-42. doi: 10.1136/amiajnl-2013-001622. Epub 2013 Apr 4.
Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge.
To construct automated systems for EVENT/TIMEX3 extraction and temporal link (TLINK) identification from clinical text.
The i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework.
Our TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537.
Our TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance.
梅奥诊所已经为 2012 年 i2b2 自然语言处理挑战赛开发了时间信息检测系统。
构建从临床文本中自动提取事件/TIMEX3 和时间链接(TLINK)的系统。
i2b2 组织者提供了 190 份标注的出院小结作为训练集,120 份出院小结作为测试集。我们的事件系统使用条件随机场分类器,具有多种特征,包括词汇信息、自然语言元素和医学本体。TIMEX3 系统采用基于规则的方法,使用正则表达式模式匹配和系统推理来确定标准化值。TLINK 系统采用基于规则的推理和机器学习。所有三个系统都构建在 Apache 非结构化信息管理架构框架中。
在挑战赛团队中,我们的 TIMEX3 系统表现最好(F 度量为 0.900,值准确率为 0.731)。事件系统产生的 F 度量为 0.870,TLINK 系统的 F 度量为 0.537。
我们的 TIMEX3 系统展示了正则表达式规则提取和规范化时间信息的良好能力。事件和 TLINK 机器学习系统需要定义良好的特征集才能表现良好。我们还可以利用专家知识作为机器学习特征的一部分,进一步提高 TLINK 识别性能。