Biomedical Informatics Center, Medical University of South Carolina, Charleston, South Carolina, USA.
J Am Med Inform Assoc. 2020 Jan 1;27(1):31-38. doi: 10.1093/jamia/ocz100.
Accurate and complete information about medications and related information is crucial for effective clinical decision support and precise health care. Recognition and reduction of adverse drug events is also central to effective patient care. The goal of this research is the development of a natural language processing (NLP) system to automatically extract medication and adverse drug event information from electronic health records. This effort was part of the 2018 n2c2 shared task on adverse drug events and medication extraction.
The new NLP system implements a stacked generalization based on a search-based structured prediction algorithm for concept extraction. We trained 4 sequential classifiers using a variety of structured learning algorithms. To enhance accuracy, we created a stacked ensemble consisting of these concept extraction models trained on the shared task training data. We implemented a support vector machine model to identify related concepts.
Experiments with the official test set showed that our stacked ensemble achieved an F1 score of 92.66%. The relation extraction model with given concepts reached a 93.59% F1 score. Our end-to-end system yielded overall micro-averaged recall, precision, and F1 score of 92.52%, 81.88% and 86.88%, respectively. Our NLP system for adverse drug events and medication extraction ranked within the top 5 of teams participating in the challenge.
This study demonstrated that a stacked ensemble with a search-based structured prediction algorithm achieved good performance by effectively integrating the output of individual classifiers and could provide a valid solution for other clinical concept extraction tasks.
准确、完整的药物及相关信息对于有效的临床决策支持和精准的医疗保健至关重要。识别和减少药物不良事件也是有效患者护理的核心。本研究的目标是开发一种自然语言处理(NLP)系统,以自动从电子健康记录中提取药物和药物不良事件信息。这项工作是 2018 年 n2c2 药物不良事件和药物提取共享任务的一部分。
新的 NLP 系统实现了基于搜索的结构化预测算法的堆叠泛化,用于概念提取。我们使用各种结构化学习算法训练了 4 个顺序分类器。为了提高准确性,我们创建了一个堆叠集成,由在共享任务训练数据上训练的这些概念提取模型组成。我们实现了一个支持向量机模型来识别相关概念。
在官方测试集上的实验表明,我们的堆叠集成达到了 92.66%的 F1 得分。给定概念的关系提取模型达到了 93.59%的 F1 得分。我们的端到端系统的总体微平均召回率、精度和 F1 得分为 92.52%、81.88%和 86.88%。我们的药物不良事件和药物提取 NLP 系统在参与挑战的团队中排名前 5。
这项研究表明,基于搜索的结构化预测算法的堆叠集成通过有效整合各个分类器的输出,取得了良好的性能,可以为其他临床概念提取任务提供有效的解决方案。