Rahman Raziur, Haider Saad, Ghosh Souparno, Pal Ranadip
Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, USA.
Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, USA.
Cancer Inform. 2016 Mar 31;14(Suppl 5):57-73. doi: 10.4137/CIN.S30794. eCollection 2015.
Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees' prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error.
由具有相等权重的回归树集合组成的随机森林常用于预测模型的设计。在本文中,我们考虑对该方法进行扩展,将回归树表示为概率树的形式,并分析异方差性的本质。概率树表示允许对置信区间(CI)进行解析计算,并且树权重优化有望在平均误差具有可比性能的情况下提供更严格的CI。我们从混合分布的角度以及作为相关随机变量的加权和的角度来处理概率树集合的预测。我们将我们的方法应用于合成数据集和癌细胞系百科数据集上的药物敏感性预测问题,并表明可以选择树权重以减少CI的平均长度而不增加平均误差。