Computer Science Department and Technology, University of Bedfordshire, UK.
Computer Science Department and Technology, University of Bedfordshire, UK.
Artif Intell Med. 2023 Sep;143:102634. doi: 10.1016/j.artmed.2023.102634. Epub 2023 Aug 14.
Decision tree (DT) models provide a transparent approach to prediction of patient's outcomes within a probabilistic framework. Averaging over DT models under certain conditions can deliver reliable estimates of predictive posterior probability distributions, which is of critical importance in the case of predicting an individual patient's outcome. Reliable estimations of the distribution can be achieved within the Bayesian framework using Markov chain Monte Carlo (MCMC) and its Reversible Jump extension enabling DT models to grow to a reasonable size. Existing MCMC strategies however have limited ability to control DT structures and tend to sample overgrown DT models, making unreasonably small partitions, thus deteriorating the uncertainty calibration. This happens because the MCMC explores a DT model parameter space within a limited knowledge of the distribution of data partitions. We propose a new adaptive strategy which overcomes this limitation, and show that in the case of predicting trauma outcomes the number of data partitions can be significantly reduced, so that the unnecessary uncertainty of estimating the predictive posterior density is avoided. The proposed and existing strategies are compared in terms of entropy which, being calculated for predicted posterior distributions, represents the uncertainty in decisions. In this framework, the proposed method has outperformed the existing sampling strategies, so that the unnecessary uncertainty in decisions is efficiently avoided.
决策树 (DT) 模型在概率框架内提供了一种预测患者结果的透明方法。在某些条件下对 DT 模型进行平均可以提供预测后验概率分布的可靠估计,这对于预测个体患者的结果至关重要。在贝叶斯框架内,使用马尔可夫链蒙特卡罗 (MCMC) 及其可反转跳跃扩展可以实现分布的可靠估计,从而使 DT 模型能够增长到合理的大小。然而,现有的 MCMC 策略对 DT 结构的控制能力有限,并且倾向于对过度生长的 DT 模型进行采样,从而形成不合理的小分区,从而降低不确定性校准。这是因为 MCMC 在对数据分区分布的有限了解范围内探索 DT 模型参数空间。我们提出了一种新的自适应策略,克服了这一限制,并表明在预测创伤结果的情况下,可以显著减少数据分区的数量,从而避免了估计预测后密度的不必要的不确定性。以熵为指标对提出的和现有的策略进行了比较,熵是针对预测后分布计算的,代表了决策中的不确定性。在这种框架下,该方法的表现优于现有的采样策略,从而有效地避免了决策中的不必要的不确定性。