Department of Pediatrics, Stanford University, MSOB X111, Stanford, CA 94305, USA.
BMC Bioinformatics. 2011 Jul 29;12:312. doi: 10.1186/1471-2105-12-312.
When a large number of candidate variables are present, a dimension reduction procedure is usually conducted to reduce the variable space before the subsequent analysis is carried out. The goal of dimension reduction is to find a list of candidate genes with a more operable length ideally including all the relevant genes. Leaving many uninformative genes in the analysis can lead to biased estimates and reduced power. Therefore, dimension reduction is often considered a necessary predecessor of the analysis because it can not only reduce the cost of handling numerous variables, but also has the potential to improve the performance of the downstream analysis algorithms.
We propose a TMLE-VIM dimension reduction procedure based on the variable importance measurement (VIM) in the frame work of targeted maximum likelihood estimation (TMLE). TMLE is an extension of maximum likelihood estimation targeting the parameter of interest. TMLE-VIM is a two-stage procedure. The first stage resorts to a machine learning algorithm, and the second step improves the first stage estimation with respect to the parameter of interest.
We demonstrate with simulations and data analyses that our approach not only enjoys the prediction power of machine learning algorithms, but also accounts for the correlation structures among variables and therefore produces better variable rankings. When utilized in dimension reduction, TMLE-VIM can help to obtain the shortest possible list with the most truly associated variables.
当存在大量候选变量时,通常会进行降维处理,以在进行后续分析之前减少变量空间。降维的目的是找到一组候选基因,这些基因的长度更易于操作,理想情况下包含所有相关基因。在分析中留下许多不相关的基因可能会导致有偏差的估计和降低的功效。因此,降维通常被认为是分析的必要前提,因为它不仅可以降低处理大量变量的成本,而且还有可能提高下游分析算法的性能。
我们提出了一种基于靶向最大似然估计(TMLE)框架中的变量重要性测量(VIM)的 TMLE-VIM 降维程序。TMLE 是对感兴趣参数的最大似然估计的扩展。TMLE-VIM 是一个两阶段的过程。第一阶段采用机器学习算法,第二阶段针对感兴趣的参数改进第一阶段的估计。
我们通过模拟和数据分析证明,我们的方法不仅具有机器学习算法的预测能力,而且还考虑了变量之间的相关结构,因此可以产生更好的变量排名。在降维中使用 TMLE-VIM 时,可以帮助获得最短的可能列表,其中包含最多真正相关的变量。