• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

了解在数据质量评估中导致多变量异常值的各个变量的影响。

Understanding the influence of individual variables contributing to multivariate outliers in assessments of data quality.

作者信息

Zink Richard C, Castro-Schilo Laura, Ding Jianfeng

机构信息

TARGET PharmaSolutions; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.

JMP Division, SAS Institute, Cary, NC, USA.

出版信息

Pharm Stat. 2018 Nov;17(6):846-853. doi: 10.1002/pst.1903. Epub 2018 Sep 26.

DOI:10.1002/pst.1903
PMID:30259643
Abstract

Mahalanobis distance is often recommended to identify patients or clinical sites that are considered unusual in clinical trials. Patients extreme in one or more covariates may be considered outliers in that they reside some distance from the multivariate mean, which can be thought of as the center of the data cloud. Less often discussed, patients whose data are believed to be "too good to be true" are located near the centroid as inliers. In order to efficiently investigate these anomalies for potential lapses in data quality, it is important to understand how the individual variables contribute to each multivariate outlier. There is a lack of literature describing a reasonable workflow for identification of outliers and their subsequent investigation to understand how each variable contributes to an observation being considered extreme. We describe how to identify multivariate inliers and outliers, classify outliers according to varying levels of severity, and summarize the contributions of variables using principal components in a manner that is accessible to a wide audience with straightforward interpretation. We illustrate how numerous data visualizations, including Pareto plots, can facilitate further review even in studies containing numerous observations and variables. We illustrate these methodologies using data from a multicenter clinical trial.

摘要

马氏距离通常被推荐用于识别在临床试验中被认为异常的患者或临床站点。在一个或多个协变量上处于极端值的患者可能被视为异常值,因为他们离多元均值有一定距离,多元均值可被视为数据云的中心。较少被讨论的是,那些数据被认为“好得难以置信”的患者作为内点位于质心附近。为了有效地调查这些异常情况以发现潜在的数据质量问题,了解各个变量如何导致每个多元异常值很重要。目前缺乏文献描述一种合理的工作流程,用于识别异常值及其后续调查,以了解每个变量如何导致一个观测值被视为极端值。我们描述了如何识别多元内点和异常值,根据严重程度的不同级别对异常值进行分类,并使用主成分以一种易于广大读者理解且解释直观的方式总结变量的贡献。我们说明了即使在包含大量观测值和变量的研究中,众多数据可视化方法(包括帕累托图)如何能够促进进一步审查。我们使用来自一项多中心临床试验的数据来说明这些方法。

相似文献

1
Understanding the influence of individual variables contributing to multivariate outliers in assessments of data quality.了解在数据质量评估中导致多变量异常值的各个变量的影响。
Pharm Stat. 2018 Nov;17(6):846-853. doi: 10.1002/pst.1903. Epub 2018 Sep 26.
2
Truncated outlier filtering.截断异常值滤波
J Biopharm Stat. 2014;24(5):1115-29. doi: 10.1080/10543406.2014.926366.
3
The utility of multivariate outlier detection techniques for data quality evaluation in large studies: an application within the ONDRI project.多元离群值检测技术在大型研究中数据质量评估的效用:ONDRI 项目中的应用。
BMC Med Res Methodol. 2019 May 15;19(1):102. doi: 10.1186/s12874-019-0737-5.
4
Outlier identification in radiation therapy knowledge-based planning: A study of pelvic cases.基于知识的放射治疗计划中的异常值识别:盆腔病例研究。
Med Phys. 2017 Nov;44(11):5617-5626. doi: 10.1002/mp.12556. Epub 2017 Sep 30.
5
Locally centred Mahalanobis distance: a new distance measure with salient features towards outlier detection.局部中心马氏距离:一种新的距离度量方法,具有显著的异常值检测特征。
Anal Chim Acta. 2013 Jul 17;787:1-9. doi: 10.1016/j.aca.2013.04.034. Epub 2013 Apr 27.
6
Application of methods for central statistical monitoring in clinical trials.中央统计监测方法在临床试验中的应用。
Clin Trials. 2013 Oct;10(5):783-806. doi: 10.1177/1740774513494504.
7
Multivariate Outliers: A Conceptual and Practical Overview for the Nurse and Health Researcher.多变量离群值:护士和健康研究人员的概念和实践概述。
Can J Nurs Res. 2021 Sep;53(3):316-321. doi: 10.1177/0844562120932054. Epub 2020 Jun 10.
8
Automated multigroup outlier identification in molecular high-throughput data using bagplots and gemplots.使用袋状图和宝石图对分子高通量数据进行自动多组异常值识别。
BMC Bioinformatics. 2017 May 2;18(1):232. doi: 10.1186/s12859-017-1645-5.
9
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
10
Distribution of variables by method of outlier detection.按异常值检测方法划分的变量分布
Front Psychol. 2012 Jul 5;3:211. doi: 10.3389/fpsyg.2012.00211. eCollection 2012.

引用本文的文献

1
Central statistical monitoring in clinical trial management: A scoping review.临床试验管理中的中央统计监测:一项范围综述。
Clin Trials. 2025 Jun;22(3):342-351. doi: 10.1177/17407745241304059. Epub 2025 Jan 2.
2
Bayesian central statistical monitoring using finite mixture models in multicenter clinical trials.在多中心临床试验中使用有限混合模型的贝叶斯中央统计监测。
Contemp Clin Trials Commun. 2020 Apr 9;19:100566. doi: 10.1016/j.conctc.2020.100566. eCollection 2020 Sep.