Kwiatkowski Fabrice, Slim Karem, Verrelle Pierre, Chamorey Emmanuel, Kramar Andrew
Unité de recherche clinique, Centre Jean Perrin, 58 rue Montalembert, 63011 Clermont-Ferrand.
Bull Cancer. 2007 Jul;94(7):680-6.
Propensity score, an indicator of the propensity to get one treatment among two (or more), is encountered in non randomized studies (prospective or retrospective). It is calculated after the research of predictive factors for treatment attribution, and corresponds to the probability to receive one of the treatments conditional to variables observed before treatment. This probability is usually generated thanks to a logistic regression equation. This score sums up by itself a whole set of parameters. It can be used as cofactor in other multivariate models that aim to evaluate with a reduced risk of confusion, the impact of therapeutical modalities on such end-points as survival, morbidity, secondary effects or quality of life. It appears very convenient to realize matching or stratification in order to compare these end-points among resulting subgroups. Despite this advantage that enables to obtain a posteriori similar subgroups, this method cannot pretend to reach the level of evidence of randomized trials, because absence of bias is never guaranteed. Apart from this major methodological weakness, propensity score appears less useful in studies provided with a large population, since in such cases, multivariate models can include enough covariates to produce in a secure way stable conclusions. When samples are small, this score remains interesting although its reliability, once more, depends on sample size and conclusions need nuances. Examples are included to illustrate the topic.
倾向得分是在非随机研究(前瞻性或回顾性)中遇到的一种指标,用于表示在两种(或更多种)治疗中接受某种治疗的倾向。它是在研究治疗分配的预测因素后计算得出的,对应于在治疗前观察到的变量条件下接受其中一种治疗的概率。这种概率通常通过逻辑回归方程生成。该得分本身总结了一整套参数。它可以用作其他多变量模型中的协变量,这些模型旨在以降低混淆风险的方式评估治疗方式对生存、发病率、副作用或生活质量等终点的影响。为了在所得亚组之间比较这些终点,进行匹配或分层似乎非常方便。尽管这种方法具有能够获得事后相似亚组的优势,但它不能声称达到随机试验的证据水平,因为永远无法保证没有偏差。除了这个主要的方法学弱点外,倾向得分在样本量较大的研究中似乎用处较小,因为在这种情况下,多变量模型可以包含足够的协变量,以可靠的方式得出稳定的结论。当样本量较小时,这个得分仍然很有意义,尽管它的可靠性再次取决于样本量,并且结论需要细微差别。文中包含了示例以说明该主题。