Clinical Neurology Research Group, Peninsula College of Medicine and Dentistry, Tamar Science Park, Room N13 ITTC Building, Davy Road, Plymouth, UK.
J Neurol. 2012 Dec;259(12):2681-94. doi: 10.1007/s00415-012-6570-y. Epub 2012 Jun 24.
Rating scales are increasingly used in neurologic research and trials. A key question relating to their use across the range of neurologic diseases, both common and rare, is what sample sizes provide meaningful estimates of reliability and validity. Here, we address two questions: (1) to what extent does sample size influence the stability of reliability and validity estimates; and (2) to what extent does sample size influence the inferences made from reliability and validity testing? We examined data from two studies. In Study 1, we retrospectively reduced the total sample randomly and nonrandomly by decrements of approximately 50 % to generate sub-samples from n = 713-20. In Study 2, we prospectively generated sub-samples from n = 20-320, by entry time into study. In all samples we estimated reliability (internal consistency, item total correlations, test-retest) and validity (within scale correlations, convergent and discriminant construct validity). Reliability estimates were stable in magnitude and interpretation in all sub-samples of both studies. Validity estimates were stable in samples of n ≥ 80, for 75 % of scales in samples of n = 40, and for 50 % of scales in samples of n = 20. In this study, sample sizes of a minimum of 20 for reliability and 80 for validity provided estimates highly representative of the main study samples. These findings should be considered provisional and more work is needed to determine if these estimates are generalisable, consistent, and useful.
量表在神经科学研究和临床试验中越来越多地被使用。一个与它们在一系列神经疾病(包括常见和罕见疾病)中的使用相关的关键问题是,多大的样本量可以提供可靠和有效的有意义的估计。在这里,我们回答两个问题:(1)样本量在多大程度上影响可靠性和有效性估计的稳定性;(2)样本量在多大程度上影响从可靠性和有效性测试中得出的推论?我们研究了两项研究的数据。在研究 1 中,我们通过大约 50%的递减随机和非随机地减少总样本量,从 n = 713-20 中生成子样本。在研究 2 中,我们通过研究入组时间从 n = 20-320 中前瞻性地生成子样本。在所有样本中,我们估计了可靠性(内部一致性、项目总分相关性、测试-重测)和有效性(量表内相关性、收敛和区分结构有效性)。在两项研究的所有子样本中,可靠性估计的大小和解释都很稳定。在 n≥80 的样本中,75%的量表在 n=40 的样本中,50%的量表在 n=20 的样本中,有效性估计是稳定的。在这项研究中,最小样本量为 20 的可靠性和 80 的有效性提供了非常代表主要研究样本的估计值。这些发现应被视为暂定的,还需要做更多的工作来确定这些估计值是否具有普遍性、一致性和有用性。