University of Michigan, Ann Arbor, MI 48109, USA.
Sci Transl Med. 2012 Apr 25;4(131):131ra49. doi: 10.1126/scitranslmed.3003561.
Conventional algorithms for modeling clinical events focus on characterizing the differences between patients with varying outcomes in historical data sets used for the model derivation. For many clinical conditions with low prevalence and where small data sets are available, this approach to developing models is challenging due to the limited number of positive (that is, event) examples available for model training. Here, we investigate how the approach of developing clinical models might be improved across three distinct patient populations (patients with acute coronary syndrome enrolled in the DISPERSE2-TIMI33 and MERLIN-TIMI36 trials, patients undergoing inpatient surgery in the National Surgical Quality Improvement Program registry, and patients undergoing percutaneous coronary intervention in the Blue Cross Blue Shield of Michigan Cardiovascular Consortium registry). For each of these cases, we supplement an incomplete characterization of patient outcomes in the derivation data set (uncensored view of the data) with an additional characterization of the extent to which patients differ from the statistical support of their clinical characteristics (censored view of the data). Our approach exploits the same training data within the derivation cohort in multiple ways to improve the accuracy of prediction. We position this approach within the context of traditional supervised (2-class) and unsupervised (1-class) learning methods and present a 1.5-class approach for clinical decision-making. We describe a 1.5-class support vector machine (SVM) classification algorithm that implements this approach, and report on its performance relative to logistic regression and 2-class SVM classification with cost-sensitive weighting and oversampling. The 1.5-class SVM algorithm improved prediction accuracy relative to other approaches and may have value in predicting clinical events both at the bedside and for risk-adjusted quality of care assessment.
传统的临床事件建模算法侧重于描述在用于模型推导的历史数据集之间具有不同结局的患者之间的差异。对于许多患病率较低且数据集较小的临床情况,由于用于模型训练的阳性(即事件)示例数量有限,因此开发模型的这种方法具有挑战性。在这里,我们研究了如何通过三种不同的患者人群(DISPERSE2-TIMI33 和 MERLIN-TIMI36 试验中患有急性冠状动脉综合征的患者、国家手术质量改进计划登记处接受住院手术的患者以及密歇根蓝十字蓝盾心血管联合会登记处接受经皮冠状动脉介入治疗的患者)来改进开发临床模型的方法。对于这些情况中的每一种,我们都补充了对数据推导数据集中患者结局的不完全描述(数据的无删失视图),并进一步描述了患者与临床特征的统计支持之间的差异程度(数据的删失视图)。我们的方法以多种方式利用推导队列中的相同训练数据来提高预测的准确性。我们将这种方法置于传统的监督(2 类)和无监督(1 类)学习方法的背景下,并提出了用于临床决策的 1.5 类方法。我们描述了一种 1.5 类支持向量机(SVM)分类算法来实现这种方法,并报告了其相对于逻辑回归和 2 类 SVM 分类的性能,包括基于成本的加权和过采样。1.5 类 SVM 算法相对于其他方法提高了预测准确性,并且可能在床边预测临床事件和风险调整后的护理质量评估方面具有价值。