Marshfield Clinic Research Institute, Marshfield, WI, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA.
Marshfield Clinic Research Institute, Marshfield, WI, USA.
J Biomed Inform. 2019 Jun;94:103185. doi: 10.1016/j.jbi.2019.103185. Epub 2019 Apr 25.
To develop machine learning models for classifying the severity of opioid overdose events from clinical data.
Opioid overdoses were identified by diagnoses codes from the Marshfield Clinic population and assigned a severity score via chart review to form a gold standard set of labels. Three primary feature sets were constructed from disparate data sources surrounding each event and used to train machine learning models for phenotyping.
Random forest and penalized logistic regression models gave the best performance with cross-validated mean areas under the ROC curves (AUCs) for all severity classes of 0.893 and 0.882 respectively. Features derived from a common data model outperformed features collected from disparate data sources for the same cohort of patients (AUCs 0.893 versus 0.837, p value = 0.002). The addition of features extracted from free text to machine learning models also increased AUCs from 0.827 to 0.893 (p value < 0.0001). Key word features extracted using natural language processing (NLP) such as 'Narcan' and 'Endotracheal Tube' are important for classifying overdose event severity.
Random forest models using features derived from a common data model and free text can be effective for classifying opioid overdose events.
开发机器学习模型,以从临床数据中对阿片类药物过量事件的严重程度进行分类。
通过 Marshfield 诊所人群的诊断代码识别阿片类药物过量事件,并通过图表审查为每个事件分配严重程度评分,以形成金标准标签集。从每个事件周围的不同数据源构建了三个主要特征集,并用于表型分析的机器学习模型训练。
随机森林和惩罚逻辑回归模型的表现最佳,所有严重程度类别的交叉验证平均 ROC 曲线下面积(AUC)分别为 0.893 和 0.882。从通用数据模型中得出的特征优于从同一患者队列的不同数据源中收集的特征(AUC 为 0.893 与 0.837,p 值=0.002)。从自由文本中提取的特征添加到机器学习模型中也将 AUC 从 0.827 提高到 0.893(p 值<0.0001)。使用自然语言处理(NLP)提取的“纳洛酮”和“气管插管”等关键字特征对于分类过量事件的严重程度很重要。
使用通用数据模型和自由文本中提取的特征的随机森林模型可有效用于分类阿片类药物过量事件。