Liu Bingyuan, Zhang Qi, Xue Lingzhou, Song Peter X-K, Kang Jian
The Pennsylvania State University.
University of Michigan.
J Am Stat Assoc. 2024;119(545):715-729. doi: 10.1080/01621459.2022.2142590. Epub 2022 Dec 8.
It is important to develop statistical techniques to analyze high-dimensional data in the presence of both complex dependence and possible heavy tails and outliers in real-world applications such as imaging data analyses. We propose a new robust high-dimensional regression with coefficient thresholding, in which an efficient nonconvex estimation procedure is proposed through a thresholding function and the robust Huber loss. The proposed regularization method accounts for complex dependence structures in predictors and is robust against heavy tails and outliers in outcomes. Theoretically, we rigorously analyze the landscape of the population and empirical risk functions for the proposed method. The fine landscape enables us to establish both statistical consistency and computational convergence under the high-dimensional setting. We also present an extension to incorporate spatial information into the proposed method. Finite-sample properties of the proposed methods are examined by extensive simulation studies. An application concerns a scalar-on-image regression analysis for an association of psychiatric disorder measured by the general factor of psychopathology with features extracted from the task functional MRI data in the Adolescent Brain Cognitive Development (ABCD) study.
在诸如成像数据分析等实际应用中,开发统计技术以分析存在复杂依赖性以及可能的重尾和异常值的高维数据非常重要。我们提出了一种带系数阈值化的新型稳健高维回归方法,其中通过一个阈值化函数和稳健的Huber损失提出了一种有效的非凸估计程序。所提出的正则化方法考虑了预测变量中的复杂依赖结构,并且对结果中的重尾和异常值具有稳健性。从理论上讲,我们严格分析了所提方法的总体和经验风险函数的态势。良好的态势使我们能够在高维设置下建立统计一致性和计算收敛性。我们还提出了一种扩展方法,将空间信息纳入所提方法中。通过广泛的模拟研究检验了所提方法的有限样本性质。一个应用涉及标量对图像回归分析,该分析用于研究在青少年大脑认知发展(ABCD)研究中,由精神病理学一般因素测量的精神障碍与从任务功能磁共振成像数据中提取的特征之间的关联。