Truntzer Caroline, Maucort-Boulch Delphine, Roy Pascal
Hospices Civils de Lyon, Service de Biostatistique, Lyon, France.
BMC Bioinformatics. 2008 Oct 15;9:434. doi: 10.1186/1471-2105-9-434.
In cancer research, most clinical variables have already been investigated and are now well established. The use of transcriptomic variables has raised two problems: restricting their number and validating their significance. Thus, their contribution to prognosis is currently thought to be overestimated. The aim of this study was to determine to what extent optimism concerning current transcriptomic models may lead to overestimation of the contribution of transcriptomic variables to survival prognosis.
To achieve this goal, Cox proportional hazards models that adjust for clinical and transcriptomic variables were built. As the relevance of the clinical variables had already been established, they were not submitted to selection. As for genes, they were selected using both univariate and multivariate methods. Optimism and the contribution of clinical and transcriptomic variables to prognosis were compared through simulations and by using the Kent and O'Quigley rho2 measure of dependence. We showed that the optimism relative to clinical variables was low because these are no longer submitted to selection of relevant variables. In contrast, for genes, the selection process introduced high optimism, which increased when the proportion of genes of interest decreased. However, this optimism can be decreased by increasing the number of samples.
Two phenomena have to be taken into account by comparing the predictive power and optimism of clinical variables and those of genes: overestimation for genes due to the selection process and underestimation for clinical variables due to the omission of relevant genes. In comparison with genes, the predictive value of validated clinical variables is not overestimated, which should be kept in mind in future studies involving both clinical and transcriptomic variables.
在癌症研究中,大多数临床变量已经得到研究且现已确立。转录组变量的使用引发了两个问题:限制其数量并验证其意义。因此,目前认为它们对预后的贡献被高估了。本研究的目的是确定当前对转录组模型的乐观态度在多大程度上可能导致对转录组变量对生存预后贡献的高估。
为实现这一目标,构建了调整临床和转录组变量的Cox比例风险模型。由于临床变量的相关性已经确立,因此未对其进行筛选。至于基因,则使用单变量和多变量方法进行选择。通过模拟并使用肯特和奥奎格利rho2依赖度量比较了乐观态度以及临床和转录组变量对预后的贡献。我们表明,相对于临床变量的乐观程度较低,因为这些变量不再进行相关变量的筛选。相比之下,对于基因,选择过程引入了较高的乐观程度,当感兴趣基因的比例降低时,这种乐观程度会增加。然而,通过增加样本数量可以降低这种乐观程度。
在比较临床变量和基因的预测能力及乐观程度时,必须考虑两种现象:由于选择过程导致对基因的高估以及由于遗漏相关基因导致对临床变量的低估。与基因相比,经过验证的临床变量的预测价值并未被高估,在未来涉及临床和转录组变量的研究中应牢记这一点。