Sims C J, Meyn L, Caruana R, Rao R B, Mitchell T, Krohn M
Department of Obstetrics, Gynecology, and Reproductive Sciences, Magee Womens Research Institute, University of Pittsburgh School of Medicine, Pennsylvania 15213, USA.
Am J Obstet Gynecol. 2000 Nov;183(5):1198-206. doi: 10.1067/mob.2000.108891.
The purpose of this study was to determine whether decision tree-based methods can be used to predict cesarean delivery.
This was a historical cohort study of women delivered of live-born singleton neonates in 1995 through 1997 (22,157). The frequency of cesarean delivery was 17%; 78 variables were used for analysis. Decision tree rule-based methods and logistic regression models were each applied to the same 50% of the sample to develop the predictive training models and these models were tested on the remaining 50%.
Decision tree receiver operating characteristic curve areas were as follows: nulliparous, 0.82; parous, 0.93. Logistic receiver operating characteristic curve areas were as follows: nulliparous, 0.86; parous, 0.93. Decision tree methods and logistic regression methods used similar predictive variables; however, logistic methods required more variables and yielded less intelligible models. Among the 6 decision tree building methods tested, the strict minimum message length criterion yielded decision trees that were small yet accurate. Risk factor variables were identified in 676 nulliparous cesarean deliveries (69%) and 419 parous cesarean deliveries (47.6%).
Decision tree models can be used to predict cesarean delivery. Models built with strict minimum message length decision trees have the following attributes: Their performance is comparable to that of logistic regression; they are small enough to be intelligible to physicians; they reveal causal dependencies among variables not detected by logistic regression; they can handle missing values more easily than can logistic methods; they predict cesarean deliveries that lack a categorized risk factor variable.
本研究旨在确定基于决策树的方法是否可用于预测剖宫产。
这是一项对1995年至1997年分娩活产单胎新生儿的妇女进行的历史性队列研究(22,157例)。剖宫产频率为17%;78个变量用于分析。基于决策树规则的方法和逻辑回归模型分别应用于相同样本的50%以建立预测训练模型,并在其余50%上对这些模型进行测试。
决策树受试者操作特征曲线面积如下:初产妇为0.82;经产妇为0.93。逻辑回归受试者操作特征曲线面积如下:初产妇为0.86;经产妇为0.93。决策树方法和逻辑回归方法使用了相似的预测变量;然而,逻辑回归方法需要更多变量且产生的模型较难理解。在测试的6种决策树构建方法中,严格的最小信息长度标准产生的决策树小而准确。在676例初产妇剖宫产(69%)和419例经产妇剖宫产(47.6%)中识别出了风险因素变量。
决策树模型可用于预测剖宫产。用严格的最小信息长度决策树构建的模型具有以下特性:其性能与逻辑回归相当;它们足够小,医生能够理解;它们揭示了逻辑回归未检测到的变量之间的因果依赖性;它们比逻辑回归方法更能轻松处理缺失值;它们能预测缺乏分类风险因素变量的剖宫产。