Zulbayar Suvd, Mollayeva Tatyana, Colantonio Angela, Chan Vincy, Escobar Michael
Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada.
Institute of Health and Policy, Management and Evaluation, University of Toronto, M5T 3M6, Canada.
Intell Based Med. 2023;8. doi: 10.1016/j.ibmed.2023.100118. Epub 2023 Nov 8.
This work aimed to identify pre-existing health conditions of patients with traumatic brain injury (TBI) and develop predictive models for the first TBI event and its external causes by employing a combination of unsupervised and supervised learning algorithms. We acquired up to five years of pre-injury diagnoses for 488,107 patients with TBI and 488,107 matched control patients who entered the emergency department or acute care hospitals between April 1st, 2002, and March 31st, 2020. Diagnoses were obtained from the Ontario Health Insurance Plan (OHIP) database which contains province-wide claims data by physicians in Ontario, Canada for inpatient and outpatient services. A screening process was conducted on the OHIP diagnostic codes to limit the subsequent analysis to codes that were predictive of TBI, which concluded that 314 codes were significantly associated with TBI. The Latent Dirichlet Allocation (LDA) model was applied to the diagnostic codes and generated an optimal number of 19 topics that concur with published literature but also suggest other unexplored areas. Estimated word-topic probabilities from the LDA model helped us detect pre-morbid conditions among patients with TBI by uncovering the underlying patterns of diagnoses, meanwhile estimated document-topic probabilities were utilized in variable creation as form of a dimension reduction. We created 19 topic scores for each patient in the cohort which were utilized along with socio-demographic factors for Random Forest binary classifier models. Test set performances evaluated using area under the receiver operating characteristic curve (AUC) were: TBI event (AUC = 0.85), external cause of injury: falls (AUC = 0.85), struck by/against (AUC = 0.83), cyclist collision (AUC = 0.76), motor vehicle collision (AUC = 0.83). Our analysis successfully demonstrated the feasibility of using machine learning to predict TBI due to various external causes and identified the most important factors that contribute to this prediction.
这项工作旨在识别创伤性脑损伤(TBI)患者的既往健康状况,并通过结合无监督和监督学习算法,为首次TBI事件及其外部原因开发预测模型。我们获取了2002年4月1日至2020年3月31日期间进入急诊科或急症护理医院的488107例TBI患者和488107例匹配对照患者长达五年的伤前诊断信息。诊断信息来自安大略省健康保险计划(OHIP)数据库,该数据库包含加拿大安大略省医生提供的全省住院和门诊服务索赔数据。对OHIP诊断代码进行了筛选,以将后续分析限制在可预测TBI的代码上,结果得出314个代码与TBI显著相关。潜在狄利克雷分配(LDA)模型应用于诊断代码,生成了19个最优主题,这些主题与已发表的文献一致,但也揭示了其他未探索的领域。LDA模型估计的词-主题概率通过揭示诊断的潜在模式帮助我们检测TBI患者的病前状况,同时估计的文档-主题概率被用作降维形式用于变量创建。我们为队列中的每位患者创建了19个主题分数,并将其与社会人口统计学因素一起用于随机森林二元分类器模型。使用受试者操作特征曲线(AUC)下面积评估的测试集性能为:TBI事件(AUC = 0.85),损伤外部原因:跌倒(AUC = 0.85),被撞击/碰撞(AUC = 0.83),自行车碰撞(AUC = 0.76),机动车碰撞(AUC = 0.83)。我们的分析成功证明了使用机器学习预测各种外部原因导致的TBI的可行性,并确定了有助于这种预测的最重要因素。