Suppr超能文献

降秩估计中的模型诊断

Model diagnostics in reduced-rank estimation.

作者信息

Chen Kun

机构信息

Department of Statistics, University of Connecticut, 215 Glenbrook Rd. U-4120, Storrs, CT 06269-4120,

出版信息

Stat Interface. 2016;9(4):469-484. doi: 10.4310/SII.2016.v9.n4.a7.

Abstract

Reduced-rank methods are very popular in high-dimensional multivariate analysis for conducting simultaneous dimension reduction and model estimation. However, the commonly-used reduced-rank methods are not robust, as the underlying reduced-rank structure can be easily distorted by only a few data outliers. Anomalies are bound to exist in big data problems, and in some applications they themselves could be of the primary interest. While naive residual analysis is often inadequate for outlier detection due to potential masking and swamping, robust reduced-rank estimation approaches could be computationally demanding. Under Stein's unbiased risk estimation framework, we propose a set of tools, including leverage score and generalized information score, to perform model diagnostics and outlier detection in large-scale reduced-rank estimation. The leverage scores give an exact decomposition of the so-called model degrees of freedom to the observation level, which lead to exact decomposition of many commonly-used information criteria; the resulting quantities are thus named information scores of the observations. The proposed information score approach provides a principled way of combining the residuals and leverage scores for anomaly detection. Simulation studies confirm that the proposed diagnostic tools work well. A pattern recognition example with hand-writing digital images and a time series analysis example with monthly U.S. macroeconomic data further demonstrate the efficacy of the proposed approaches.

摘要

降秩方法在高维多元分析中非常流行,用于同时进行降维和模型估计。然而,常用的降秩方法并不稳健,因为潜在的降秩结构很容易被少数数据异常值扭曲。大数据问题中必然存在异常值,并且在某些应用中,异常值本身可能是主要关注对象。虽然由于潜在的掩盖和淹没效应,朴素残差分析通常不足以用于异常值检测,但稳健的降秩估计方法可能在计算上要求很高。在斯坦因无偏风险估计框架下,我们提出了一组工具,包括杠杆得分和广义信息得分,用于在大规模降秩估计中进行模型诊断和异常值检测。杠杆得分将所谓的模型自由度精确分解到观测层面,这导致许多常用信息准则的精确分解;由此产生的量因此被称为观测的信息得分。所提出的信息得分方法为结合残差和杠杆得分进行异常值检测提供了一种有原则的方法。模拟研究证实所提出的诊断工具效果良好。一个使用手写数字图像的模式识别示例和一个使用美国月度宏观经济数据的时间序列分析示例进一步证明了所提出方法的有效性。

相似文献

1
Model diagnostics in reduced-rank estimation.
Stat Interface. 2016;9(4):469-484. doi: 10.4310/SII.2016.v9.n4.a7.
2
On the degrees of freedom of reduced-rank estimators in multivariate regression.
Biometrika. 2015;102(2):457-477. doi: 10.1093/biomet/asu067. Epub 2015 Feb 9.
3
Robust reduced-rank regression.
Biometrika. 2017 Sep;104(3):633-647. doi: 10.1093/biomet/asx032. Epub 2017 Jul 12.
5
Residuals and regression diagnostics: focusing on logistic regression.
Ann Transl Med. 2016 May;4(10):195. doi: 10.21037/atm.2016.03.36.
6
Image Denoising Based on Nonlocal Bayesian Singular Value Thresholding and Stein's Unbiased Risk Estimator.
IEEE Trans Image Process. 2019 Oct;28(10):4899-4911. doi: 10.1109/TIP.2019.2912292. Epub 2019 Apr 26.
7
Consensus Outlier Detection Using Sum of Ranking Differences of Common and New Outlier Measures Without Tuning Parameter Selections.
Anal Chem. 2017 May 2;89(9):5087-5094. doi: 10.1021/acs.analchem.7b00637. Epub 2017 Apr 13.
8
System Identification in Presence of Outliers.
IEEE Trans Cybern. 2016 May;46(5):1202-16. doi: 10.1109/TCYB.2015.2430356. Epub 2015 May 20.
9
Fast and accurate Slicewise OutLIer Detection (SOLID) with informed model estimation for diffusion MRI data.
Neuroimage. 2018 Nov 1;181:331-346. doi: 10.1016/j.neuroimage.2018.07.003. Epub 2018 Jul 5.
10
A new method for robust mixture regression.
Can J Stat. 2017 Mar;45(1):77-94. doi: 10.1002/cjs.11310. Epub 2016 Dec 29.

引用本文的文献

1
TARO: tree-aggregated factor regression for microbiome data integration.
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae321.
2
Stability Approach to Regularization Selection for Reduced-Rank Regression.
J Comput Graph Stat. 2023;32(3):974-984. doi: 10.1080/10618600.2022.2119986. Epub 2022 Oct 14.

本文引用的文献

1
On the degrees of freedom of reduced-rank estimators in multivariate regression.
Biometrika. 2015;102(2):457-477. doi: 10.1093/biomet/asu067. Epub 2015 Feb 9.
2
Quantitative assessment of multiscale structural and functional alterations in asthmatic populations.
J Appl Physiol (1985). 2015 May 15;118(10):1286-98. doi: 10.1152/japplphysiol.01094.2014. Epub 2015 Mar 26.
4
Learning regulatory programs by threshold SVD regression.
Proc Natl Acad Sci U S A. 2014 Nov 4;111(44):15675-80. doi: 10.1073/pnas.1417808111. Epub 2014 Oct 20.
5
Reduced rank regression via adaptive nuclear norm penalization.
Biometrika. 2013 Dec 4;100(4):901-920. doi: 10.1093/biomet/ast036.
6
Sparse Multivariate Regression With Covariance Estimation.
J Comput Graph Stat. 2010 Fall;19(4):947-962. doi: 10.1198/jcgs.2010.09188.
8
Registration-based assessment of regional lung function via volumetric CT images of normal subjects vs. severe asthmatics.
J Appl Physiol (1985). 2013 Sep 1;115(5):730-42. doi: 10.1152/japplphysiol.00113.2013. Epub 2013 Jun 6.
9
PERTURBATION AND SCALED COOK'S DISTANCE.
Ann Stat. 2012;40(2):785-811. doi: 10.1214/12-AOS978.
10
Reduced Rank Ridge Regression and Its Kernel Extensions.
Stat Anal Data Min. 2011 Dec;4(6):612-622. doi: 10.1002/sam.10138. Epub 2011 Oct 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验