Doty R L, McKeown D A, Lee W W, Shaman P
Department of Otorhinolaryngology: Head and Neck Surgery, University of Pennsylvania Medical Center, Philadelphia 19104, USA.
Chem Senses. 1995 Dec;20(6):645-56. doi: 10.1093/chemse/20.6.645.
Ten tests of olfactory function (including tests of odor identification, detection, discrimination, memory, and suprathreshold odor intensity and pleasantness perception) were administered on two test occasions to 57 subjects ranging in age from 18 to 83 years. The stability of the average test scores was determined across the two test sessions for 14 measures derived from these 10 tests and for subcomponents of the Japanese T&T olfactometer threshold test. In addition, the test-retest reliability (Pearson r) of each test measure was established. With the exception of a response bias measure, the average test scores did not differ significantly across the two test sessions. Statistically, the reliability coefficients of the primary test measures fell into three general classes bound by the following r values: 0.43-0.53; 0.67-0.71; 0.76-0.90. Detection threshold values were more reliable than recognition threshold values; those based upon a single ascending presentation series were much less reliable than those based upon a staircase procedure. The relationship between test length and reliability was examined for several of the tests and mathematically modeled. For example, within the staircase series incorporating the odorant phenyl ethyl alcohol, reliability was related (R2 = 0.984) to the number of reversals included in the threshold estimate by a function derived from the Spearman-Brown formula; namely, reliability = 0.455* # reversals/[1 + 0.455 (# reversals - 1)]. Reversal location, per se, had little influence on reliability. Overall, this study suggests that (i) considerable variation is present in the reliability of olfactory tests, (ii) reliability is a function of test length, and (iii) caution is warranted in comparing results from nominally different olfactory tests in applied settings since the findings may, in some instances, simply reflect the differential reliability of the tests.
对57名年龄在18至83岁之间的受试者进行了两次嗅觉功能测试(包括气味识别、检测、辨别、记忆以及阈上气味强度和愉悦度感知测试)。针对从这10项测试中得出的14项指标以及日本T&T嗅觉计阈值测试的子组件,确定了两次测试期间平均测试分数的稳定性。此外,还确定了每项测试指标的重测信度(Pearson r)。除了一项反应偏差指标外,两次测试期间的平均测试分数没有显著差异。从统计学角度来看,主要测试指标的信度系数分为三大类,界限如下:r值为0.43 - 0.53;0.67 - 0.71;0.76 - 0.90。检测阈值比识别阈值更可靠;基于单一递增呈现系列的阈值比基于阶梯程序的阈值可靠性要低得多。对其中几项测试研究了测试长度与信度之间的关系,并进行了数学建模。例如,在包含气味物质苯乙醇的阶梯系列中,信度与阈值估计中包含的反转次数相关(R2 = 0.984),该函数源自斯皮尔曼 - 布朗公式;即,信度 = 0.455 * #反转次数/[1 + 0.455(#反转次数 - 1)]。反转位置本身对信度影响不大。总体而言,本研究表明:(i)嗅觉测试的信度存在相当大的差异;(ii)信度是测试长度的函数;(iii)在应用场景中比较名义上不同的嗅觉测试结果时需谨慎,因为在某些情况下,研究结果可能仅仅反映了测试的不同信度。