Kim Youngjun, Garvin Jennifer, Heavirland Julia, Meystre Stéphane M
School of Computing, University of Utah, Salt Lake City, Utah, U.S.
Stud Health Technol Inform. 2013;192:185-9.
Adapting an information extraction application to a new domain (e.g., new categories of narrative text) typically requires re-training the application with the new narratives. But could previous training from the original domain alleviate this adaptation? After having developed an NLP-based application to extract congestive heart failure treatment performance measures from echocardiogram reports (i.e., the source domain), we adapted it to a large variety of clinical documents (i.e., the target domain). We wanted to reuse the machine learning trained models from the source domain, and experimented with several popular domain adaptation approaches such as reusing the predictions from the source model, or applying a linear interpolation. As a result, we measured higher recall and precision (92.4% and 95.3% respectively) than when training with the target domain only.
将信息提取应用程序适配到新领域(例如,新类别的叙述文本)通常需要使用新的叙述对该应用程序进行重新训练。但是,来自原始领域的先前训练能否减轻这种适配工作呢?在开发了一个基于自然语言处理(NLP)的应用程序以从超声心动图报告中提取充血性心力衰竭治疗性能指标(即源领域)之后,我们将其适配到了各种各样的临床文档(即目标领域)。我们希望重用源领域中经过机器学习训练的模型,并试验了几种流行的领域适配方法,例如重用源模型的预测结果,或应用线性插值。结果,我们测得的召回率和精确率(分别为92.4%和95.3%)高于仅使用目标领域进行训练时的情况。