Institute of Health Policy, Management and Evaluation (IHPME), University of Toronto, Toronto, ON, Canada.
Peter Munk Cardiac Centre, Toronto General Hospital, University Health Network (UHN), Toronto, ON, Canada.
BMC Med Inform Decis Mak. 2022 Apr 6;22(1):93. doi: 10.1186/s12911-022-01837-2.
Routinely collected administrative data is widely used for population-based research. However, although clinically very different, atrial septal defects (ASD) and patent foramen ovale (PFO) share a single diagnostic code (ICD-9: 745.5, ICD-10: Q21.1). Using machine-learning based approaches, we developed and validated an algorithm to differentiate between PFO and ASD patient populations within healthcare administrative data.
Using data housed at ICES, we identified patients who underwent transcatheter closure in Ontario between October 2002 and December 2017 using a Canadian Classification of Interventions code (1HN80GPFL, N = 4680). A novel random forest model was developed using demographic and clinical information to differentiate those who underwent transcatheter closure for PFO or ASD. Those patients who had undergone transcatheter closure and had records in the CorHealth Ontario cardiac procedure registry (N = 1482) were used as the reference standard. Several algorithms were tested and evaluated for accuracy, sensitivity, and specificity. Variable importance was examined via mean decrease in Gini index.
We tested 7 models in total. The final model included 24 variables, including demographic, comorbidity, and procedural information. After hyperparameter tuning, the final model achieved 0.76 accuracy, 0.76 sensitivity, and 0.75 specificity. Patient age group had the greatest influence on node impurity, and thus ranked highest in variable importance.
Our random forest classification method achieved reasonable accuracy in identifying PFO and ASD closure in administrative data. The algorithm can now be applied to evaluate long term PFO and ASD closure outcomes in Ontario, pending future external validation studies to further test the algorithm.
基于人群的研究广泛使用常规收集的行政数据。然而,尽管在临床上有很大的不同,房间隔缺损(ASD)和卵圆孔未闭(PFO)共享一个单一的诊断代码(ICD-9:745.5,ICD-10:Q21.1)。我们使用基于机器学习的方法,开发并验证了一种算法,以区分医疗行政数据中的 PFO 和 ASD 患者人群。
利用安大略省评估科学研究所(ICES)的数据,我们使用加拿大干预分类代码(1HN80GPFL,N=4680)确定了 2002 年 10 月至 2017 年 12 月期间在安大略省接受经导管封堵术的患者。使用人口统计学和临床信息开发了一种新的随机森林模型,以区分那些接受经导管封堵 PFO 或 ASD 的患者。那些接受经导管封堵且在 CorHealth 安大略省心脏手术登记处(N=1482)有记录的患者被用作参考标准。测试并评估了几种算法的准确性、敏感性和特异性。通过基尼指数的平均减少来检查变量的重要性。
我们总共测试了 7 种模型。最终模型包括 24 个变量,包括人口统计学、合并症和程序信息。经过超参数调整后,最终模型的准确性为 0.76,敏感性为 0.76,特异性为 0.75。患者年龄组对节点杂质的影响最大,因此在变量重要性中排名最高。
我们的随机森林分类方法在识别行政数据中的 PFO 和 ASD 封堵方面取得了合理的准确性。该算法现在可以应用于评估安大略省的长期 PFO 和 ASD 封堵结果,等待未来的外部验证研究进一步测试该算法。