Habbema J D, Gelpke G J
Comput Programs Biomed. 1981 Sep-Dec;13(3-4):251-70. doi: 10.1016/0010-468x(81)90103-3.
The computer program INDEP-SELECT has been developed for selection of an optimal subset from a set of possibly informative diagnostic or prognostic variables. But the program is equally useful for other discriminant analysis or pattern recognition problems involving variable selection. The approach is probabilistic; i.e., diagnostic probabilities are assigned to a patient on the basis of the values observed on the diagnostic variables. The statistical model used is largely based on the assumption of independency between the variables, but one model-parameter, the so-called 'global association factor', is added in order to take dependency into account. The stepwise forward selection strategy of adding in each selection step a new variable to the set of already selected variables, is used. The user may choose between a number of selection criteria. Such a criterion is used in order to decide in each selection step which variable should be added. All criteria are based on measures of diagnostic or prognostic performance. INDEP-SELECT is able to handle a large number of variables, also with missing data, and a large number of patients. The program is written in ANS Standard FORTRAN, and takes relatively little computation time.
计算机程序INDEP - SELECT已被开发用于从一组可能提供信息的诊断或预后变量中选择最优子集。但该程序对于涉及变量选择的其他判别分析或模式识别问题同样有用。其方法是概率性的;也就是说,根据在诊断变量上观察到的值为患者分配诊断概率。所使用的统计模型很大程度上基于变量之间独立性的假设,但添加了一个模型参数,即所谓的“全局关联因子”,以便考虑依赖性。采用逐步向前选择策略,即在每个选择步骤中将一个新变量添加到已选变量集中。用户可以在多种选择标准之间进行选择。使用这样一个标准来决定在每个选择步骤中应添加哪个变量。所有标准均基于诊断或预后性能的度量。INDEP - SELECT能够处理大量变量,包括有缺失数据的情况,以及大量患者。该程序用ANS标准FORTRAN编写,计算时间相对较短。