Regenstrief Institute, Indianapolis, IN, United States.
Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, United States.
J Med Internet Res. 2021 Nov 15;23(11):e31337. doi: 10.2196/31337.
The COVID-19 pandemic has highlighted the inability of health systems to leverage existing system infrastructure in order to rapidly develop and apply broad analytical tools that could inform state- and national-level policymaking, as well as patient care delivery in hospital settings. The COVID-19 pandemic has also led to highlighted systemic disparities in health outcomes and access to care based on race or ethnicity, gender, income-level, and urban-rural divide. Although the United States seems to be recovering from the COVID-19 pandemic owing to widespread vaccination efforts and increased public awareness, there is an urgent need to address the aforementioned challenges.
This study aims to inform the feasibility of leveraging broad, statewide datasets for population health-driven decision-making by developing robust analytical models that predict COVID-19-related health care resource utilization across patients served by Indiana's statewide Health Information Exchange.
We leveraged comprehensive datasets obtained from the Indiana Network for Patient Care to train decision forest-based models that can predict patient-level need of health care resource utilization. To assess these models for potential biases, we tested model performance against subpopulations stratified by age, race or ethnicity, gender, and residence (urban vs rural).
For model development, we identified a cohort of 96,026 patients from across 957 zip codes in Indiana, United States. We trained the decision models that predicted health care resource utilization by using approximately 100 of the most impactful features from a total of 1172 features created. Each model and stratified subpopulation under test reported precision scores >70%, accuracy and area under the receiver operating curve scores >80%, and sensitivity scores approximately >90%. We noted statistically significant variations in model performance across stratified subpopulations identified by age, race or ethnicity, gender, and residence (urban vs rural).
This study presents the possibility of developing decision models capable of predicting patient-level health care resource utilization across a broad, statewide region with considerable predictive performance. However, our models present statistically significant variations in performance across stratified subpopulations of interest. Further efforts are necessary to identify root causes of these biases and to rectify them.
COVID-19 大流行凸显了卫生系统无法利用现有系统基础设施,快速开发和应用广泛的分析工具,为州和国家层面的决策以及医院环境中的患者护理提供信息。COVID-19 大流行还导致了基于种族或民族、性别、收入水平和城乡差距的健康结果和获得医疗保健的系统差异凸显。尽管由于广泛的疫苗接种努力和公众意识的提高,美国似乎正在从 COVID-19 大流行中恢复,但迫切需要解决上述挑战。
本研究旨在通过开发可预测印第安纳州全州健康信息交换所服务的患者与 COVID-19 相关的医疗资源利用的强大分析模型,为基于人群健康的决策利用广泛的全州数据集提供信息,从而确定其可行性。
我们利用从印第安纳州患者护理网络获得的综合数据集来训练决策森林模型,这些模型可以预测患者对医疗资源利用的需求。为了评估这些模型是否存在潜在偏差,我们根据年龄、种族或民族、性别和居住地(城市与农村)对模型进行分层,以测试模型性能。
在模型开发过程中,我们从美国印第安纳州的 957 个邮政编码中确定了一个包含 96026 名患者的队列。我们使用从总共 1172 个特征中创建的约 100 个最具影响力的特征来训练预测医疗资源利用的决策模型。每个模型和测试的分层子群体的精度得分均>70%,准确性和接收者操作特征曲线下的面积得分均>80%,灵敏度得分约>90%。我们注意到,根据年龄、种族或民族、性别和居住地(城市与农村)确定的分层子群体的模型性能存在统计学显著差异。
本研究提出了开发能够在具有相当大预测性能的广泛全州范围内预测患者水平医疗资源利用的决策模型的可能性。然而,我们的模型在感兴趣的分层子群体中的表现存在统计学显著差异。需要进一步努力确定这些偏差的根本原因并加以纠正。