Department of Sports Science and Clinical Biomechanics, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark.
Department of Management, Technology, and Economics, ETH Zurich, Weinbergstr. 56/58, 8092 Zurich, Switzerland.
J Clin Epidemiol. 2020 Jun;122:27-34. doi: 10.1016/j.jclinepi.2020.02.005. Epub 2020 Feb 22.
A conceptually oriented preprocessing of a large number of potential prognostic factors may improve the development of a prognostic model. This study investigated whether various forms of conceptually oriented preprocessing or the preselection of established factors was superior to using all factors as input.
We made use of an existing project that developed two conceptually oriented subgroupings of low back pain patients. Based on the prediction of six outcome variables by seven statistical methods, this type of preprocessing was compared with medical experts' preselection of established factors, as well as using all 112 available baseline factors.
Subgrouping of patients was associated with low prognostic capacity. Applying a Lasso-based variable selection to all factors or to domain-specific principal component scores performed best. The preselection of established factors showed a good compromise between model complexity and prognostic capacity.
The prognostic capacity is hard to improve by means of a conceptually oriented preprocessing when compared to purely statistical approaches. However, a careful selection of already established factors combined in a simple linear model should be considered as an option when constructing a new prognostic rule based on a large number of potential prognostic factors.
对大量潜在预后因素进行概念导向的预处理可能会改善预后模型的开发。本研究旨在探讨各种形式的概念导向预处理或已建立因素的预选是否优于将所有因素作为输入。
我们利用了一个现有的项目,该项目对腰痛患者进行了两种概念导向的亚组划分。基于七种统计方法对六个结局变量的预测,将这种类型的预处理与医学专家对已建立因素的预选以及使用 112 个可用基线因素进行了比较。
患者亚组化与预后能力较低相关。基于 Lasso 的变量选择应用于所有因素或特定领域的主成分得分表现最佳。已建立因素的预选在模型复杂性和预后能力之间取得了很好的平衡。
与纯粹的统计方法相比,通过概念导向的预处理来提高预后能力是困难的。然而,当基于大量潜在预后因素构建新的预后规则时,应考虑将已经建立的因素结合在一个简单的线性模型中进行仔细选择。