4257Queen's University, Canada.
8664University of Manitoba, Canada.
Health Informatics J. 2021 Oct-Dec;27(4):14604582211053259. doi: 10.1177/14604582211053259.
This study proposes a predictive model that uses structured data and unstructured narrative notes from Electronic Medical Records to accurately identify patients diagnosed with Post-Traumatic Stress Disorder (PTSD). We utilize data from primary care clinicians participating in the Manitoba Primary Care Research Network (MaPCReN) representing 154,118 patients. A reference sample of 195 patients that had their PTSD diagnosis confirmed using a manual chart review of structured data and narrative notes, and PTSD negative patients is used as the gold standard data for model training, validation and testing. We assess structured and unstructured data from eight tables in the MaPCReN namely, patient demographics, disease case, examinations, medication, billing records, health condition, risk factors, and encounter notes. Feature engineering is applied to convert data into proper representation for predictive modeling. We explore serial and parallel mixed data models that are trained on both structured and unstructured data to identify PTSD. Model performances were calculated based on a highly skewed hold-out test dataset. The serial model that uses both structured and text data as input, yielded the highest values in sensitivity (0.77), F-measure (0.76), and AUC (0.88) and the parallel model that uses both structured and text data as the input obtained the highest positive predicted value (PPV) (0.75). Diseases such as PTSD are difficult to diagnose. Information recorded in the chart note over multiple visits of the patients with the primary care physicians has higher predictive power than structured data and combining these two data types can increase the predictive capabilities of machine learning models in diagnosing PTSD. While the deep-learning model outperformed the traditional ensemble model in processing text data, the ensemble classifier obtained better results in ingesting a combination of features obtained from both data types in the serial mixed model. The study demonstrated that unstructured encounter notes enhance a model's ability to identify patients diagnosed with PTSD. These findings can enhance quality improvement, research, and disease surveillance related to PTSD in primary care populations.
本研究提出了一个预测模型,该模型使用电子病历中的结构化数据和非结构化叙述性笔记来准确识别诊断为创伤后应激障碍(PTSD)的患者。我们利用来自参与马尼托巴初级保健研究网络(MaPCReN)的初级保健临床医生的数据,该网络代表了 154118 名患者。使用手动查看结构化数据和叙述性笔记来确认 PTSD 诊断的 195 名患者的参考样本和 PTSD 阴性患者作为模型训练、验证和测试的金标准数据。我们评估了 MaPCReN 中的八个表中的结构化和非结构化数据,分别是患者人口统计学、疾病案例、检查、药物、计费记录、健康状况、风险因素和就诊记录。特征工程用于将数据转换为预测建模的适当表示形式。我们探索了在结构化和非结构化数据上训练的串行和并行混合数据模型,以识别 PTSD。模型性能是基于高度偏斜的保留测试数据集计算的。使用结构化和文本数据作为输入的串行模型在敏感性(0.77)、F 度量(0.76)和 AUC(0.88)方面产生了最高值,而使用结构化和文本数据作为输入的并行模型则获得了最高的阳性预测值(PPV)(0.75)。像 PTSD 这样的疾病很难诊断。在与初级保健医生进行多次就诊期间记录在图表说明中的信息比结构化数据具有更高的预测能力,并且结合这两种数据类型可以提高机器学习模型在诊断 PTSD 方面的预测能力。虽然深度学习模型在处理文本数据方面优于传统集成模型,但在串行混合模型中,集成分类器在摄取从两种数据类型中获取的特征组合方面获得了更好的结果。该研究表明,非结构化就诊记录增强了模型识别被诊断为 PTSD 的患者的能力。这些发现可以提高初级保健人群中与 PTSD 相关的质量改进、研究和疾病监测。