Royston Patrick, Altman Douglas G, Sauerbrei Willi
MRC Clinical Trials Unit, 222 Euston Road, London NW1 2DA, UK.
Stat Med. 2006 Jan 15;25(1):127-41. doi: 10.1002/sim.2331.
In medical research, continuous variables are often converted into categorical variables by grouping values into two or more categories. We consider in detail issues pertaining to creating just two groups, a common approach in clinical research. We argue that the simplicity achieved is gained at a cost; dichotomization may create rather than avoid problems, notably a considerable loss of power and residual confounding. In addition, the use of a data-derived 'optimal' cutpoint leads to serious bias. We illustrate the impact of dichotomization of continuous predictor variables using as a detailed case study a randomized trial in primary biliary cirrhosis. Dichotomization of continuous data is unnecessary for statistical analysis and in particular should not be applied to explanatory variables in regression models.
在医学研究中,连续变量常常通过将数值分组为两个或更多类别而转换为分类变量。我们详细考虑与仅创建两个组相关的问题,这是临床研究中的一种常见方法。我们认为,所实现的简单性是以一定代价换来的;二分法可能会产生而非避免问题,尤其是会导致相当大的效能损失和残余混杂。此外,使用数据衍生的“最优”切点会导致严重偏差。我们以原发性胆汁性肝硬化的一项随机试验作为详细案例研究,来说明连续预测变量二分法的影响。连续数据的二分法对于统计分析而言并无必要,尤其不应应用于回归模型中的解释变量。