Feinstein A R, Wells C K, Walter S D
Yale University School of Medicine, New Haven, CT 06510.
J Clin Epidemiol. 1990;43(4):339-47. doi: 10.1016/0895-4356(90)90120-e.
This paper and the two following papers (Parts I-III) report an investigation of performance variability for four multivariable methods: discriminant function analysis, and linear, logistic, and Cox regression. Each method was examined for its performance in using the same independent variables to develop predictive models for survival of a large cohort of patients with lung cancer. The cogent biologic attributes of the patients had previously been divided into five ordinal stages having a strong prognostic gradient. With stratified random sampling, we prepared seven "generating" sets of data in which the five biologic stages were arranged in proportional, uniform, symmetrical unimodal, decreasing exponential, increasing exponential, U-shaped, or bi-modal distributions. Each of the multivariable methods was applied to each of the seven generating distributions, and the results were tested in a separate "challenge" set, which had not been included in any of the generating sets. The research was intended not merely to compare the performance of the multivariable methods, but also to see how their performance would be affected by different statistical distributions of the same cogent biologic attributes. The results, which are presented in the second and third papers, were compared for selection of independent variables and coefficients, and for accuracy in fitting the generating sets and the challenge set.
本文以及接下来的两篇论文(第一至三部分)报告了对四种多变量方法性能变异性的调查:判别函数分析、线性回归、逻辑回归和Cox回归。研究了每种方法在使用相同自变量为一大群肺癌患者的生存情况建立预测模型时的性能。患者的相关生物学特征先前已被分为具有强烈预后梯度的五个序贯阶段。通过分层随机抽样,我们准备了七组“生成”数据,其中五个生物学阶段按比例、均匀、对称单峰、递减指数、递增指数、U形或双峰分布排列。将每种多变量方法应用于七种生成分布中的每一种,并在一个单独的“挑战”集中对结果进行测试,该“挑战”集未包含在任何生成集中。该研究不仅旨在比较多变量方法的性能,还想了解相同相关生物学特征的不同统计分布如何影响它们的性能。在第二篇和第三篇论文中呈现的结果,针对自变量和系数的选择以及拟合生成集和挑战集的准确性进行了比较。