Weinstein Lawrence, Radano Todd A, Jack Timothy, Kalina Philip, Eberhardt John S
Catasys, Inc., Los Angeles, CA, USA.
Perspect Health Inf Manag. 2009 Sep 16;6(Fall):1b.
This paper explores the use of machine learning and Bayesian classification models to develop broadly applicable risk stratification models to guide disease management of health plan enrollees with substance use disorder (SUD). While the high costs and morbidities associated with SUD are understood by payers, who manage it through utilization review, acute interventions, coverage and cost limitations, and disease management, the literature shows mixed results for these modalities in improving patient outcomes and controlling cost. Our objective is to evaluate the potential of data mining methods to identify novel risk factors for chronic disease and stratification of enrollee utilization, which can be used to develop new methods for targeting disease management services to maximize benefits to both enrollees and payers.
For our evaluation, we used DecisionQ machine learning algorithms to build Bayesian network models of a representative sample of data licensed from Thomson-Reuters' MarketScan consisting of 185,322 enrollees with three full-year claim records. Data sets were prepared, and a stepwise learning process was used to train a series of Bayesian belief networks (BBNs). The BBNs were validated using a 10 percent holdout set.
The networks were highly predictive, with the risk-stratification BBNs producing area under the curve (AUC) for SUD positive of 0.948 (95 percent confidence interval [CI], 0.944-0.951) and 0.736 (95 percent CI, 0.721-0.752), respectively, and SUD negative of 0.951 (95 percent CI, 0.947-0.954) and 0.738 (95 percent CI, 0.727-0.750), respectively. The cost estimation models produced area under the curve ranging from 0.72 (95 percent CI, 0.708-0.731) to 0.961 (95 percent CI, 0.95-0.971).
We were able to successfully model a large, heterogeneous population of commercial enrollees, applying state-of-the-art machine learning technology to develop complex and accurate multivariate models that support near-real-time scoring of novel payer populations based on historic claims and diagnostic data. Initial validation results indicate that we can stratify enrollees with SUD diagnoses into different cost categories with a high degree of sensitivity and specificity, and the most challenging issue becomes one of policy. Due to the social stigma associated with the disease and ethical issues pertaining to access to care and individual versus societal benefit, a thoughtful dialogue needs to occur about the appropriate way to implement these technologies.
本文探讨了使用机器学习和贝叶斯分类模型来开发广泛适用的风险分层模型,以指导患有物质使用障碍(SUD)的健康计划参保人的疾病管理。虽然支付方了解与SUD相关的高成本和高发病率,并通过利用审查、急性干预、保险范围和成本限制以及疾病管理来进行管理,但文献表明这些方式在改善患者预后和控制成本方面的结果参差不齐。我们的目标是评估数据挖掘方法识别慢性病新风险因素和参保人利用情况分层的潜力,这可用于开发新方法,将疾病管理服务目标化,以实现参保人和支付方的利益最大化。
为进行评估,我们使用DecisionQ机器学习算法,基于从汤森路透市场扫描数据库获得许可的具有代表性的数据样本构建贝叶斯网络模型,该样本包含185,322名参保人,每人有三年的完整理赔记录。我们对数据集进行了准备,并使用逐步学习过程来训练一系列贝叶斯信念网络(BBN)。使用留一法验证集对BBN进行验证。
这些网络具有很高的预测性,风险分层BBN对SUD阳性的曲线下面积(AUC)分别为0.948(95%置信区间[CI],0.944 - 0.951)和0.736(95%CI,0.721 - 0.752),对SUD阴性的曲线下面积分别为0.951(95%CI,0.947 - 0.954)和0.738(95%CI,0.727 - 0.750)。成本估计模型的曲线下面积范围为0.72(95%CI,0.708 - 0.731)至0.961(95%CI,0.95 - 0.971)。
我们能够成功地对大量异质的商业参保人群进行建模,应用先进的机器学习技术开发复杂且准确的多变量模型,这些模型可根据历史理赔和诊断数据对新的支付方人群进行近实时评分。初步验证结果表明,我们能够以高度的敏感性和特异性将患有SUD诊断的参保人分层到不同的成本类别中,而最具挑战性的问题变成了政策问题。由于与该疾病相关的社会污名以及与获得护理和个人与社会利益相关的伦理问题,需要就实施这些技术的适当方式进行深入的对话。