School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China.
School of Computer, South China Normal University, Guangzhou, China.
BMC Med Inform Decis Mak. 2018 Mar 22;18(Suppl 1):22. doi: 10.1186/s12911-018-0595-9.
Temporal expression extraction and normalization is a fundamental and essential step in clinical text processing and analyzing. Though a variety of commonly used NLP tools are available for medical temporal information extraction, few work is satisfactory for multi-lingual heterogeneous clinical texts.
A novel method called TEER is proposed for both multi-lingual temporal expression extraction and normalization from various types of narrative clinical texts including clinical data requests, clinical notes, and clinical trial summaries. TEER is characterized as temporal feature summarization, heuristic rule generation, and automatic pattern learning. By representing a temporal expression as a triple <M, A, N>, TEER identifies temporal mentions M, assigns type attributes A to M, and normalizes the values of M into formal representations N.
Based on two heterogeneous clinical text datasets: 400 actual clinical requests in English and 1459 clinical discharge summaries in Chinese. TEER was compared with six state-of-the-art baselines. The results showed that TEER achieved a precision of 0.948 and a recall of 0.877 on the English clinical requests, while a precision of 0.941 and a recall of 0.932 on the Chinese discharge summaries.
An automated method TEER for multi-lingual temporal expression extraction was presented. Based on the two datasets containing heterogeneous clinical texts, the comparison results demonstrated the effectiveness of the TEER method in multi-lingual temporal expression extraction from heterogeneous narrative clinical texts.
时间表达提取和规范化是临床文本处理和分析的基础和必要步骤。尽管有各种常用的自然语言处理工具可用于医学时间信息提取,但很少有工作能令人满意地处理多语言异质临床文本。
提出了一种名为 TEER 的新方法,用于从各种类型的叙述性临床文本中提取和规范化多语言时间表达,包括临床数据请求、临床记录和临床试验摘要。TEER 的特点是时间特征总结、启发式规则生成和自动模式学习。通过将时间表达式表示为三元组< M, A, N >,TEER 可以识别时间提及 M,为 M 分配类型属性 A,并将 M 的值规范化为正式表示 N。
基于两个异质临床文本数据集:400 份英文实际临床请求和 1459 份中文临床出院记录。将 TEER 与六种最先进的基线进行了比较。结果表明,TEER 在英语临床请求上的精度为 0.948,召回率为 0.877,而在中文出院记录上的精度为 0.941,召回率为 0.932。
提出了一种用于多语言时间表达提取的自动化方法 TEER。基于包含异质临床文本的两个数据集,比较结果表明 TEER 方法在从异质叙述性临床文本中提取多语言时间表达方面的有效性。