Ateya Mohammad, Aristeridou Danai, Sands George H, Zielinski Jessica, Grout Randall W, Colavecchia A Carmine, Wazni Oussama, Haque Saira N
Pfizer Inc, New York, New York.
Regenstrief Institute, Indianapolis, Indiana.
Heart Rhythm O2. 2024 Sep 26;5(12):925-935. doi: 10.1016/j.hroo.2024.09.010. eCollection 2024 Dec.
Prediction models for atrial fibrillation (AF) may enable earlier detection and guideline-directed treatment decisions. However, model bias may lead to inaccurate predictions and unintended consequences.
The purpose of this study was to validate, assess bias, and improve generalizability of "UNAFIED-10," a 2-year, 10-variable predictive model of undiagnosed AF in a national data set (originally developed using the Indiana Network for Patient Care regional data).
UNAFIED-10 was validated and optimized using Optum de-identified electronic health record data set. AF diagnoses were recorded in the January 2018-December 2019 period (outcome period), with January 2016-December 2017 as the baseline period. Validation cohorts (patients with AF and non-AF controls, aged ≥40 years) comprised the full imbalanced and randomly sampled balanced data sets. Model performance and bias in patient subpopulations based on sex, insurance, race, and region were evaluated.
Of the 6,058,657 eligible patients (mean age 60 ± 12 years), 4.1% (n = 246,975) had their first AF diagnosis within the outcome period. The validated UNAFIED-10 model achieved a higher C-statistic (0.85 [95% confidence interval 0.85-0.86] vs 0.81 [0.80-0.81]) and sensitivity (86% vs 74%) but lower specificity (66% vs 74%) than the original UNAFIED-10 model. During retraining and optimization, the variables insurance, shock, and albumin were excluded to address bias and improve generalizability. This generated an 8-variable model (UNAFIED-8) with consistent performance.
UNAFIED-10, developed using regional patient data, displayed consistent performance in a large national data set. UNAFIED-8 is more parsimonious and generalizable for using advanced analytics for AF detection. Future directions include validation on additional data sets.
心房颤动(AF)预测模型可能有助于早期检测和指导治疗决策。然而,模型偏差可能导致预测不准确和意外后果。
本研究旨在验证、评估偏差并提高“UNAFIED-10”的通用性,这是一个基于全国数据集的未诊断AF的2年、10变量预测模型(最初使用印第安纳州患者护理网络区域数据开发)。
使用Optum去识别电子健康记录数据集对UNAFIED-10进行验证和优化。AF诊断记录于2018年1月至2019年12月期间(结果期),以2016年1月至2017年12月为基期。验证队列(年龄≥40岁的AF患者和非AF对照)包括完全不平衡和随机抽样的平衡数据集。评估了基于性别、保险、种族和地区的患者亚组中的模型性能和偏差。
在6,058,657名符合条件的患者(平均年龄60±12岁)中,4.1%(n = 246,975)在结果期内首次诊断为AF。经过验证的UNAFIED-10模型的C统计量更高(0.85 [95%置信区间0.85 - 0.86] 对比0.81 [0.80 - 0.81]),敏感性更高(86%对比74%),但特异性低于原始的UNAFIED-10模型(66%对比74%)。在重新训练和优化过程中,排除了保险、休克和白蛋白变量以解决偏差并提高通用性。这产生了一个性能一致的8变量模型(UNAFIED-8)。
使用区域患者数据开发的UNAFIED-10在大型全国数据集中表现出一致的性能。UNAFIED-8更简洁且适用于使用高级分析进行AF检测。未来的方向包括在其他数据集上进行验证。