Department of Ophthalmology and Visual Sciences, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.
University of Michigan Medical School, University of Michigan, Ann Arbor, Michigan, USA.
Ophthalmic Genet. 2021 Jun;42(3):283-290. doi: 10.1080/13816810.2021.1897848. Epub 2021 Mar 17.
: Several novel treatments of inherited retinal degenerations have undergone phase I/IIa clinical trials with limited sample size, yet investigators must still determine if toxicity or an efficacy signal occurred or if the change was due to test-retest variability (TRV) of the measurement tool.: Synthetic datasets were used to compare three types of TRV estimators under different sample sizes, mean drift, skewness, and number of baseline measurements.: Mixed effects models underestimated the standard deviation of measurement error (SDEM); the unbiased change score estimator method (UBS) was more accurate. The fixed effect model had less bias and smaller standard deviation than UBS if >2 baseline measurements. The change score estimator had no bias; other estimators introduced bias for lower variability. With sample size <10, all estimators had high variance. With sample size ≥10, the differences between methods were often minimal. The pooled estimator model did not capture drift, whereas a fixed effect regression or mixed effects models accounted for drift while maintaining an accurate measure of variance. With small sample sizes, the bootstrap estimates of SDEM were severe underestimates, while the jackknife estimates were mildly low but much better. The jackknife was more accurate for the unbiased change score method than for the pooled estimator.: The ideal phase I/IIa study has ≥20 subjects and uses UBS or its fixed effect model generalization if >2 baseline measurements. With non-ideal study parameters, investigators should at least quantify the error estimate present in their data analysis.
几种新型遗传性视网膜退行性疾病的治疗方法已经进行了 I/IIa 期临床试验,样本量有限,但研究人员仍必须确定是否发生了毒性或疗效信号,或者变化是否是由于测量工具的测试-重测变异性(TRV)。
合成数据集用于比较三种不同样本量、均值漂移、偏度和基线测量次数下的 TRV 估计器。
混合效应模型低估了测量误差的标准差(SDEM);无偏变化评分估计器方法(UBS)更准确。如果基线测量值>2,固定效应模型比 UBS 的偏差更小,标准差更小。变化评分估计器没有偏差;其他估计器在变异性较低时会引入偏差。样本量<10 时,所有估计器的方差都很高。样本量≥10 时,方法之间的差异通常很小。总体估计器模型无法捕捉漂移,而固定效应回归或混合效应模型在保持方差准确度量的同时可以解释漂移。在小样本量的情况下,SDEM 的自举估计值严重低估,而刀切估计值则低估得较轻,但要好得多。对于无偏变化评分方法,刀切估计值比总体估计值更准确。
理想的 I/IIa 期研究应具有≥20 名受试者,并在>2 个基线测量值的情况下使用 UBS 或其固定效应模型推广。对于非理想的研究参数,研究人员至少应量化其数据分析中存在的误差估计值。