按异常值检测方法划分的变量分布

Distribution of variables by method of outlier detection.

作者信息

Finch W Holmes

机构信息

Department of Educational Psychology, Ball State University Muncie, IN, USA.

出版信息

Front Psychol. 2012 Jul 5;3:211. doi: 10.3389/fpsyg.2012.00211. eCollection 2012.

DOI:10.3389/fpsyg.2012.00211

PMID:22783214

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3389806/

Abstract

The presence of outliers can very problematic in data analysis, leading statisticians to develop a wide variety of methods for identifying them in both the univariate and multivariate contexts. In case of the latter, perhaps the most popular approach has been Mahalanobis distance, where large values suggest an observation that is unusual as compared to the center of the data. However, researchers have identified problems with the application of this metric such that its utility may be limited in some situations. As a consequence, other methods for detecting outlying observations have been developed and studied. However, a number of these approaches, while apparently robust and useful have not made their way into general practice in the social sciences. Thus, the goal of this study was to describe some of these methods and demonstrate them using a well known dataset from a popular multivariate textbook widely used in the social sciences. Results demonstrated that the methods do indeed result in datasets with very different distributional characteristics. These results are discussed in light of how they might be used by researchers and practitioners.

摘要

异常值的存在在数据分析中可能会带来很大问题，促使统计学家开发了各种各样的方法来在单变量和多变量情况下识别它们。在多变量情况下，也许最流行的方法是马氏距离，其中较大的值表明一个观察值与数据中心相比是异常的。然而，研究人员已经发现了应用这种度量标准存在的问题，以至于它的效用在某些情况下可能会受到限制。因此，已经开发并研究了其他检测异常观测值的方法。然而，这些方法中的许多方法，虽然显然稳健且有用，但尚未在社会科学的一般实践中得到应用。因此，本研究的目的是描述其中一些方法，并使用社会科学中广泛使用的一本流行多变量教科书中的一个著名数据集对它们进行演示。结果表明，这些方法确实会导致具有非常不同分布特征的数据集。根据研究人员和从业者如何使用这些结果进行了讨论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad63/3389806/aabdcc10bfb4/fpsyg-03-00211-g001.jpg

相似文献

Distribution of variables by method of outlier detection.

Front Psychol. 2012 Jul 5;3:211. doi: 10.3389/fpsyg.2012.00211. eCollection 2012.

Mahalanobis distances for ecological niche modelling and outlier detection: implications of sample size, error, and bias for selecting and parameterising a multivariate location and scatter method.

PeerJ. 2021 May 11;9:e11436. doi: 10.7717/peerj.11436. eCollection 2021.

The utility of multivariate outlier detection techniques for data quality evaluation in large studies: an application within the ONDRI project.

BMC Med Res Methodol. 2019 May 15;19(1):102. doi: 10.1186/s12874-019-0737-5.

Outlier detection in multivariate analytical chemical data.

Anal Chem. 1998 Jun 1;70(11):2372-9. doi: 10.1021/ac970763d.

Outlier modeling for spectral data reduction.

J Opt Soc Am A Opt Image Sci Vis. 2014 Jul 1;31(7):1445-52. doi: 10.1364/JOSAA.31.001445.

Locally centred Mahalanobis distance: a new distance measure with salient features towards outlier detection.

Anal Chim Acta. 2013 Jul 17;787:1-9. doi: 10.1016/j.aca.2013.04.034. Epub 2013 Apr 27.

Detecting outlying subjects in high-dimensional neuroimaging datasets with regularized minimum covariance determinant.

Med Image Comput Comput Assist Interv. 2011;14(Pt 3):264-71. doi: 10.1007/978-3-642-23626-6_33.

Protein-protein interaction site predictions with minimum covariance determinant and Mahalanobis distance.

J Theor Biol. 2017 Nov 21;433:57-63. doi: 10.1016/j.jtbi.2017.08.026. Epub 2017 Sep 1.

Understanding the influence of individual variables contributing to multivariate outliers in assessments of data quality.

Pharm Stat. 2018 Nov;17(6):846-853. doi: 10.1002/pst.1903. Epub 2018 Sep 26.

Independent-model diagnostics for a priori identification and interpretation of outliers from a full pharmacokinetic database: correspondence analysis, Mahalanobis distance and Andrews curves.

J Pharmacokinet Pharmacodyn. 2008 Apr;35(2):159-83. doi: 10.1007/s10928-007-9082-0. Epub 2008 Feb 22.

引用本文的文献

English validation of the Multidimensional Scale of Motives for Postponing Parenthood (MSMPP-18-EN): Factorial structure, psychometric properties, and correlates.

PLoS One. 2025 Aug 18;20(8):e0329404. doi: 10.1371/journal.pone.0329404. eCollection 2025.

The Associations of Maternal Prepregnancy Body Mass Index With Human Milk Fatty Acid and Phospholipid Composition in the Observational Norwegian Human Milk Study.

J Nutr. 2025 Jun;155(6):1818-1827. doi: 10.1016/j.tjnut.2025.04.009. Epub 2025 Apr 12.

Dublin Anti-Bullying Self-Efficacy Scales: Bifactor and Item Response Theory Models.

J Interpers Violence. 2023 Jul;38(13-14):8721-8749. doi: 10.1177/08862605231155137. Epub 2023 Mar 3.

Reduced cortical cerebral blood flow in antipsychotic-free first-episode psychosis and relationship to treatment response.

Psychol Med. 2023 Aug;53(11):5235-5245. doi: 10.1017/S0033291722002288. Epub 2022 Aug 25.

Adaptation of Work Values Instrument in Indonesian Final Year University Students.

Front Psychol. 2022 May 12;13:858688. doi: 10.3389/fpsyg.2022.858688. eCollection 2022.

A practical guide for researchers and reviewers using the ABCD Study and other large longitudinal datasets.

Dev Cogn Neurosci. 2022 Jun;55:101115. doi: 10.1016/j.dcn.2022.101115. Epub 2022 May 20.

Development and preliminary psychometric investigation of the German Satisfaction with Comprehensive Cancer Care (SCCC) Questionnaire.

Health Qual Life Outcomes. 2021 May 17;19(1):147. doi: 10.1186/s12955-021-01784-y.

Examining the Factorial Structure of the in a Portuguese Sample.

Front Psychol. 2021 Jan 13;11:571734. doi: 10.3389/fpsyg.2020.571734. eCollection 2020.

Evaluation of the effectiveness of hip and knee implant models used in Catalonia: a protocol for a prospective registry-based study.

J Orthop Surg Res. 2019 Feb 21;14(1):61. doi: 10.1186/s13018-019-1087-z.

Greater Tuberosity Fractures: Does Fracture Assessment and Treatment Recommendation Vary Based on Imaging Modality?

Clin Orthop Relat Res. 2016 May;474(5):1257-65. doi: 10.1007/s11999-016-4706-6. Epub 2016 Jan 21.

本文引用的文献

Robust Mokken Scale Analysis by Means of the Forward Search Algorithm for Outlier Detection.

Multivariate Behav Res. 2011 Feb 7;46(1):58-89. doi: 10.1080/00273171.2011.544227.

Patient classification as an outlier detection problem: an application of the One-Class Support Vector Machine.

Neuroimage. 2011 Oct 1;58(3):793-804. doi: 10.1016/j.neuroimage.2011.06.042. Epub 2011 Jun 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

按异常值检测方法划分的变量分布

Distribution of variables by method of outlier detection.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献