Bejan Cosmin Adrian, Wei Wei-Qi, Denny Joshua C
Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA.
Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA.
J Am Med Inform Assoc. 2015 Apr;22(e1):e162-76. doi: 10.1136/amiajnl-2014-002954. Epub 2014 Oct 21.
To evaluate the contribution of the MEDication Indication (MEDI) resource and SemRep for identifying treatment relations in clinical text.
We first processed clinical documents with SemRep to extract the Unified Medical Language System (UMLS) concepts and the treatment relations between them. Then, we incorporated MEDI into a simple algorithm that identifies treatment relations between two concepts if they match a medication-indication pair in this resource. For a better coverage, we expanded MEDI using ontology relationships from RxNorm and UMLS Metathesaurus. We also developed two ensemble methods, which combined the predictions of SemRep and the MEDI algorithm. We evaluated our selected methods on two datasets, a Vanderbilt corpus of 6864 discharge summaries and the 2010 Informatics for Integrating Biology and the Bedside (i2b2)/Veteran's Affairs (VA) challenge dataset.
The Vanderbilt dataset included 958 manually annotated treatment relations. A double annotation was performed on 25% of relations with high agreement (Cohen's κ = 0.86). The evaluation consisted of comparing the manual annotated relations with the relations identified by SemRep, the MEDI algorithm, and the two ensemble methods. On the first dataset, the best F1-measure results achieved by the MEDI algorithm and the union of the two resources (78.7 and 80, respectively) were significantly higher than the SemRep results (72.3). On the second dataset, the MEDI algorithm achieved better precision and significantly lower recall values than the best system in the i2b2 challenge. The two systems obtained comparable F1-measure values on the subset of i2b2 relations with both arguments in MEDI.
Both SemRep and MEDI can be used to extract treatment relations from clinical text. Knowledge-based extraction with MEDI outperformed use of SemRep alone, but superior performance was achieved by integrating both systems. The integration of knowledge-based resources such as MEDI into information extraction systems such as SemRep and the i2b2 relation extractors may improve treatment relation extraction from clinical text.
评估药物适应症(MEDI)资源和SemRep在识别临床文本中治疗关系方面的作用。
我们首先使用SemRep处理临床文档,以提取统一医学语言系统(UMLS)概念及其之间的治疗关系。然后,我们将MEDI纳入一个简单算法中,该算法在两个概念与该资源中的药物-适应症对匹配时识别它们之间的治疗关系。为了获得更好的覆盖范围,我们利用RxNorm和UMLS元词表中的本体关系扩展了MEDI。我们还开发了两种集成方法,将SemRep和MEDI算法的预测结果相结合。我们在两个数据集上评估了我们选择的方法,一个是包含6864份出院小结的范德比尔特语料库,另一个是2010年整合生物学与床边信息学(i2b2)/退伍军人事务部(VA)挑战数据集。
范德比尔特数据集包含958个手动标注的治疗关系。对25%的关系进行了双标注,一致性较高(科恩kappa系数=0.86)。评估包括将手动标注的关系与SemRep、MEDI算法以及两种集成方法识别的关系进行比较。在第一个数据集上,MEDI算法和两种资源联合获得的最佳F1值结果(分别为78.7和80)显著高于SemRep的结果(72.3)。在第二个数据集上,MEDI算法的精确率更高,但召回率显著低于i2b2挑战中最佳系统。在MEDI中两个参数都有的i2b2关系子集上,这两个系统获得了相当的F1值。
SemRep和MEDI均可用于从临床文本中提取治疗关系。基于知识的MEDI提取方法优于单独使用SemRep,但将两个系统整合可获得更优性能。将MEDI等基于知识的资源整合到SemRep和i2b2关系提取器等信息提取系统中,可能会改善从临床文本中提取治疗关系的效果。