Dept. of Mathematics, Seminar for Statistics, ETH Zurich, Universitätstrasse 6, 8092, Zurich, Switzerland.
Center for Biomedical Informatics, Brown University, 233 Richmond Street, Providence, RI, 02912, United States.
Int J Med Inform. 2019 Sep;129:20-28. doi: 10.1016/j.ijmedinf.2019.05.020. Epub 2019 May 23.
Manual annotation and categorization of non-standardized text ("free-text") of drug orders entered into electronic health records is a labor-intensive task. However, standardization is required for drug order analyses and has implications for clinical decision support. Machine learning could help to speed up manual labelling efforts. The objective of this study was to analyze the performance of deep machine learning methods to annotate non-standardized text of drug order entries with their therapeutically active ingredients.
The data consisted of drug orders entered 8/2009-4/2014 into the electronic health records of inpatients at a large tertiary care academic medical center. We manually annotated the most frequent order entry patterns with the active ingredient they contain (e.g. "Prograf"⟵"Tacrolimus"). We heuristically included additional orders by means of character sequence comparisons to augment the training dataset. Finally, we trained and employed character-level recurrent deep neural networks to classify non-standardized text of drug order entries according to their active ingredients.
A total of 26,611 distinct order patterns were considered in our study, of which the top 7.6% (2028) had been annotated with one of 558 distinct ingredients, leaving 24,583 unlabeled observations. Character-level recurrent deep neural networks achieved a Mean Reciprocal Rank (MRR) of 98% and outperformed the best representative baseline, a trigram-based Support Vector Machine, by 2 percentage points.
Character-level recurrent deep neural networks can be used to map the active ingredient to non-standardized text of drug order entries, outperforming other representative techniques. While machine learning might help to facilitate categorization tasks, still a considerable amount of manual labelling and reviewing work is required to train such systems.
将电子病历中输入的非标准化文本(“自由文本”)进行手动注释和分类是一项劳动密集型任务。然而,药物医嘱分析需要标准化,这对临床决策支持具有重要意义。机器学习可以帮助加快手动标记的工作。本研究的目的是分析深度学习方法在注释药物医嘱输入的非标准化文本及其治疗活性成分方面的性能。
数据来源于 2009 年 8 月至 2014 年 4 月期间一家大型三级学术医疗中心住院患者的电子病历中的药物医嘱。我们手动注释了最常见的医嘱输入模式及其所含的活性成分(例如“Prograf”⟵“Tacrolimus”)。我们通过字符序列比较启发式地添加了其他医嘱,以扩充训练数据集。最后,我们训练并使用字符级递归深度神经网络根据药物医嘱输入的活性成分对非标准化文本进行分类。
在我们的研究中,共考虑了 26611 个不同的医嘱模式,其中 7.6%(2028)被注释了 558 种不同的成分之一,其余 24583 个观察值未被标记。字符级递归深度神经网络的平均倒数排名(MRR)为 98%,比最佳代表基线(基于三进制的支持向量机)高出 2 个百分点。
字符级递归深度神经网络可用于将活性成分映射到药物医嘱输入的非标准化文本,性能优于其他代表性技术。虽然机器学习可以帮助促进分类任务,但仍需要大量的手动标记和审查工作来训练此类系统。