Kim Youngjun, Riloff Ellen, Hurdle John F
School of Computing, University of Utah, Salt Lake City, UT.
Department of Biomedical Informatics, University of Utah, Salt Lake City, UT.
AMIA Annu Symp Proc. 2015 Nov 5;2015:737-46. eCollection 2015.
Our research investigates methods for creating effective concept extractors for specialty clinical notes. First, we present three new "specialty area" datasets consisting of Cardiology, Neurology, and Orthopedics clinical notes manually annotated with medical concepts. We analyze the medical concepts in each dataset and compare with the widely used i2b2 2010 corpus. Second, we create several types of concept extraction models and examine the effects of training supervised learners with specialty area data versus i2b2 data. We find substantial differences in performance across the datasets, and obtain the best results for all three specialty areas by training with both i2b2 and specialty data. Third, we explore strategies to improve concept extraction on specialty notes with ensemble methods. We compare two types of ensemble methods (Voting/Stacking) and a domain adaptation model, and show that a Stacked ensemble of classifiers trained with i2b2 and specialty data yields the best performance.
我们的研究探讨了为专科临床记录创建有效概念提取器的方法。首先,我们展示了三个新的“专科领域”数据集,这些数据集由心脏病学、神经病学和骨科学临床记录组成,并手动标注了医学概念。我们分析了每个数据集中的医学概念,并与广泛使用的i2b2 2010语料库进行比较。其次,我们创建了几种类型的概念提取模型,并研究使用专科领域数据与i2b2数据训练监督学习器的效果。我们发现不同数据集的性能存在显著差异,通过同时使用i2b2数据和专科数据进行训练,在所有三个专科领域都取得了最佳结果。第三,我们探索使用集成方法改进专科记录概念提取的策略。我们比较了两种类型的集成方法(投票/堆叠)和一个领域适应模型,并表明使用i2b2数据和专科数据训练的堆叠分类器集成产生了最佳性能。