He Kevin, Kang Jian, Hong Hyokyoung G, Zhu Ji, Li Yanming, Lin Huazhen, Xu Han, Li Yi
Department of Biostatistics, School of Public Health, University of Michigan.
Department of Statistics and Probability, Michigan State University.
Comput Stat Data Anal. 2019 Apr;132:100-114. doi: 10.1016/j.csda.2018.09.001. Epub 2018 Sep 22.
Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors far greater than the sample size. In order to identify more novel biomarkers and understand biological mechanisms, it is vital to detect signals weakly associated with outcomes among ultrahigh-dimensional predictors. However, existing screening methods, which typically ignore correlation information, are likely to miss weak signals. By incorporating the inter-feature dependence, a covariance-insured screening approach is proposed to identify predictors that are jointly informative but marginally weakly associated with outcomes. The validity of the method is examined via extensive simulations and a real data study for selecting potential genetic factors related to the onset of multiple myeloma.
现代生物技术产生了大量高通量数据,预测变量的数量远远超过样本量。为了识别更多新颖的生物标志物并理解生物学机制,在超高维预测变量中检测与结果弱相关的信号至关重要。然而,现有的筛选方法通常忽略相关性信息,很可能会遗漏弱信号。通过纳入特征间的依赖性,提出了一种协方差保障筛选方法,以识别那些共同提供信息但与结果边际弱相关的预测变量。通过广泛的模拟和一项用于选择与多发性骨髓瘤发病相关潜在遗传因素的真实数据研究,检验了该方法的有效性。