使用等效性检验评估测量一致性的基础指南。

A Primer on the Use of Equivalence Testing for Evaluating Measurement Agreement.

机构信息

Department of Statistics, Iowa State University, Ames, IA.

出版信息

Med Sci Sports Exerc. 2018 Apr;50(4):837-845. doi: 10.1249/MSS.0000000000001481.

DOI:10.1249/MSS.0000000000001481

PMID:29135817

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5856600/

Abstract

PURPOSE

Statistical equivalence testing is more appropriate than conventional tests of difference to assess the validity of physical activity (PA) measures. This article presents the underlying principles of equivalence testing and gives three examples from PA and fitness assessment research.

METHODS

The three examples illustrate different uses of equivalence tests. Example 1 uses PA data to evaluate an activity monitor's equivalence to a known criterion. Example 2 illustrates the equivalence of two field-based measures of physical fitness with no known reference method. Example 3 uses regression to evaluate an activity monitor's equivalence across a suite of 23 activities.

RESULTS

The examples illustrate the appropriate reporting and interpretation of results from equivalence tests. In the first example, the mean criterion measure is significantly within ±15% of the mean PA monitor. The mean difference is 0.18 METs and the 90% confidence interval of -0.15 to 0.52 is inside the equivalence region of -0.65 to 0.65. In the second example, we chose to define equivalence for these two measures as a ratio of mean values between 0.98 and 1.02. The estimated ratio of mean V˙O2 values is 0.99, which is significantly (P = 0.007) inside the equivalence region. In the third example, the PA monitor is not equivalent to the criterion across the suite of activities. The estimated regression intercept and slope are -1.23 and 1.06. Neither confidence interval is within the suggested regression equivalence regions.

CONCLUSIONS

When the study goal is to show similarity between methods, equivalence testing is more appropriate than traditional statistical tests of differences (e.g., ANOVA and t-tests).

摘要

目的

与传统的差异检验相比，统计学等效性检验更适合评估体力活动（PA）测量的有效性。本文介绍了等效性检验的基本原理，并给出了来自 PA 和体能评估研究的三个示例。

方法

这三个示例说明了等效性检验的不同用途。示例 1 使用 PA 数据来评估活动监测器与已知标准的等效性。示例 2 说明了两种基于现场的体能测量方法与无已知参考方法的等效性。示例 3 使用回归来评估活动监测器在 23 项活动中的等效性。

结果

这些示例说明了从等效性检验中得出的结果的适当报告和解释。在第一个示例中，平均标准测量值显著在±15%的平均 PA 监测值内。平均差异为 0.18 METs，-0.15 至 0.52 的 90%置信区间在-0.65 至 0.65 的等效区间内。在第二个示例中，我们选择将这两种测量方法的等效性定义为平均值比值在 0.98 和 1.02 之间。估计的平均 V˙O2 值比值为 0.99，显著（P=0.007）在等效区间内。在第三个示例中，PA 监测器在整个活动套件中与标准不一致。估计的回归截距和斜率分别为-1.23 和 1.06。置信区间都不在建议的回归等效区间内。

结论

当研究目标是显示方法之间的相似性时，等效性检验比传统的差异检验（如 ANOVA 和 t 检验）更合适。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7971/5856600/778b56669925/nihms916849f1.jpg

相似文献

A Primer on the Use of Equivalence Testing for Evaluating Measurement Agreement.使用等效性检验评估测量一致性的基础指南。

Med Sci Sports Exerc. 2018 Apr;50(4):837-845. doi: 10.1249/MSS.0000000000001481.

Sample size determination for a three-arm equivalence trial of normally distributed responses.正态分布响应的三臂等效性试验的样本量确定

J Biopharm Stat. 2014;24(6):1190-202. doi: 10.1080/10543406.2014.948552.

Primer of statistics in dental research: part I.牙科研究中的统计学基础：第一部分。

J Prosthodont Res. 2014 Jan;58(1):11-6. doi: 10.1016/j.jpor.2013.12.006. Epub 2014 Jan 22.

An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor.一种建立统计等效性的推断置信区间方法，该方法校正了特赖恩（2001年）的缩减因子。

Psychol Methods. 2008 Sep;13(3):272-7. doi: 10.1037/a0013158.

Reliability and validity of self-reported physical activity in the Nord-Trøndelag Health Study: HUNT 1.挪威北特伦德拉格郡健康研究（HUNT 1）中自我报告的身体活动的可靠性和有效性。

Scand J Public Health. 2008 Jan;36(1):52-61. doi: 10.1177/1403494807085373.

'Sportmotorische Bestandesaufnahme': criterion- vs. norm-based reference values of fitness tests for Swiss first grade children.“运动机能评估”：瑞士一年级儿童体能测试基于标准与基于常模的参考值

Eur J Sport Sci. 2015;15(2):134-42. doi: 10.1080/17461391.2014.918659. Epub 2014 Jun 19.

Recommendations on evidence needed to support measurement equivalence between electronic and paper-based patient-reported outcome (PRO) measures: ISPOR ePRO Good Research Practices Task Force report.关于支持电子和纸质患者报告结局（PRO）测量等效性所需证据的建议：国际药物经济学与结果研究协会（ISPOR）电子PRO良好研究实践工作组报告

Value Health. 2009 Jun;12(4):419-29. doi: 10.1111/j.1524-4733.2008.00470.x. Epub 2008 Nov 11.

Equivalence tests for comparing correlation and regression coefficients.用于比较相关系数和回归系数的等效性检验。

Br J Math Stat Psychol. 2015 May;68(2):292-309. doi: 10.1111/bmsp.12045. Epub 2014 Oct 27.

Relationship between daily physical activity and aerobic fitness in adults with cystic fibrosis.囊性纤维化成年患者的日常身体活动与有氧适能之间的关系。

BMC Pulm Med. 2015 May 9;15:59. doi: 10.1186/s12890-015-0036-9.

Relation of oxygen uptake to work rate in prepubertal healthy children - reference for VO /W-slope and effect on cardiorespiratory fitness assessment.青春期前健康儿童摄氧量与工作率的关系——VO₂/W斜率参考值及其对心肺适能评估的影响

Clin Physiol Funct Imaging. 2018 Jul;38(4):645-651. doi: 10.1111/cpf.12461. Epub 2017 Aug 9.

引用本文的文献

Limited Interchangeability of Smartwatches and Lace-Mounted IMUs for Running Gait Analysis.用于跑步步态分析的智能手表和鞋带式惯性测量单元的有限互换性

Sensors (Basel). 2025 Sep 5;25(17):5553. doi: 10.3390/s25175553.

Performance evaluation of algorithms to estimate daily sedentary time using wrist-worn sensors in free-living adults.使用腕部佩戴式传感器估计自由生活成年人每日久坐时间的算法性能评估

J Meas Phys Behav. 2025 Jan;8(1). doi: 10.1123/jmpb.2024-0051. Epub 2025 Jun 10.

A predictive model for body water and fluid balance using 3D smartphone anthropometry.一种使用3D智能手机人体测量学的身体水分和液体平衡预测模型。

Front Physiol. 2025 Jun 23;16:1577049. doi: 10.3389/fphys.2025.1577049. eCollection 2025.

Dose Reduction in Scintigraphic Imaging Through Enhanced Convolutional Autoencoder-Based Denoising.基于增强卷积自动编码器去噪的闪烁成像剂量降低

J Imaging. 2025 Jun 14;11(6):197. doi: 10.3390/jimaging11060197.

Physical activity and sedentary behavior in peritoneal dialysis patients: a comparative analysis of ActiGraph GT3X data collected via wrist and waist with placement-specific cut-points.腹膜透析患者的身体活动与久坐行为：通过手腕和腰部佩戴ActiGraph GT3X收集的数据并采用特定部位切点的比较分析

BMC Nephrol. 2025 Apr 5;26(1):178. doi: 10.1186/s12882-025-04100-8.

The measurement reliability and equivalence of print versus online versions of the Youth Activity Profile.《青少年活动概况》印刷版与网络版的测量可靠性及等效性

PLoS One. 2025 Jan 24;20(1):e0312254. doi: 10.1371/journal.pone.0312254. eCollection 2025.

Validity of the Actigraph-GT9X accelerometer for measuring steps and energy expenditures in heart failure patients.Actigraph-GT9X加速度计在测量心力衰竭患者步数和能量消耗方面的有效性。

PLoS One. 2024 Dec 30;19(12):e0315575. doi: 10.1371/journal.pone.0315575. eCollection 2024.

The influence of mode of remote delivery on health-related quality of life outcome measures in British Sign Language: a mixed methods pilot randomised crossover trial.远程分娩方式对英国手语健康相关生活质量结局指标的影响：一项混合方法的试点随机交叉试验。

Qual Life Res. 2025 Mar;34(3):657-667. doi: 10.1007/s11136-024-03864-0. Epub 2024 Dec 11.

Validity, Reliability, and Sensitivity of a Commercially Available Velocity Measuring Device When Performing Selected Exercises.一种市售速度测量设备在进行特定练习时的有效性、可靠性和敏感性。

Int J Exerc Sci. 2024 Nov 1;17(4):1250-1279. doi: 10.70252/NVPI2815. eCollection 2024.

Evaluating the performance of open-source and proprietary processing of actigraphy sleep estimation in children with suspected sleep disorders: a comparison with polysomnography.评估开源和专有活动记录仪睡眠估计处理方法在疑似睡眠障碍儿童中的性能：与多导睡眠图的比较。

Sleep. 2025 Apr 11;48(4). doi: 10.1093/sleep/zsae267.

本文引用的文献

Calibration and Validation of the Youth Activity Profile: The FLASHE Study.青少年活动概况的校准与验证：FLASHE研究

Am J Prev Med. 2017 Jun;52(6):880-887. doi: 10.1016/j.amepre.2016.12.010.

Accuracy of inclinometer functions of the activPAL and ActiGraph GT3X+: A focus on physical activity.activPAL和ActiGraph GT3X+倾斜仪功能的准确性：以身体活动为重点。

Gait Posture. 2017 Jan;51:174-180. doi: 10.1016/j.gaitpost.2016.10.014. Epub 2016 Oct 18.

Agreement Between VO Predicted From PACER and One-Mile Run Time-Equated Laps.通过PACER预测的VO与一英里跑时间等效圈数之间的一致性。

Res Q Exerc Sport. 2016 Dec;87(4):421-426. doi: 10.1080/02701367.2016.1216067. Epub 2016 Sep 1.

The accuracy of the 24-h activity recall method for assessing sedentary behaviour: the physical activity measurement survey (PAMS) project.用于评估久坐行为的24小时活动回忆法的准确性：体力活动测量调查（PAMS）项目

J Sports Sci. 2017 Feb;35(3):255-261. doi: 10.1080/02640414.2016.1161218. Epub 2016 Mar 28.

Validity and Calibration of the Youth Activity Profile.青少年活动概况的效度与校准

PLoS One. 2015 Dec 2;10(12):e0143949. doi: 10.1371/journal.pone.0143949. eCollection 2015.

Comparison of Consumer and Research Monitors under Semistructured Settings.半结构化环境下消费者显示器与研究显示器的比较

Med Sci Sports Exerc. 2016 Jan;48(1):151-8. doi: 10.1249/MSS.0000000000000727.

Cross-Validation of a PACER Prediction Equation for Assessing Aerobic Capacity in Hungarian Youth.用于评估匈牙利青少年有氧能力的PACER预测方程的交叉验证

Res Q Exerc Sport. 2015 Jun 26;86 Suppl 1:S66-73. doi: 10.1080/02701367.2015.1043002.

Criterion Validity of Competing Accelerometry-Based Activity Monitoring Devices.基于加速度计的竞争性活动监测设备的标准效度。

Med Sci Sports Exerc. 2015 Nov;47(11):2456-63. doi: 10.1249/MSS.0000000000000691.

Equating accelerometer estimates among youth: The Rosetta Stone 2.青少年计步器数据的等效性：罗塞塔石 2 号。

J Sci Med Sport. 2016 Mar;19(3):242-249. doi: 10.1016/j.jsams.2015.02.006. Epub 2015 Feb 23.

Comparisons of prediction equations for estimating energy expenditure in youth.用于估计青少年能量消耗的预测方程比较。

J Sci Med Sport. 2016 Jan;19(1):35-40. doi: 10.1016/j.jsams.2014.10.002. Epub 2014 Oct 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验