论罗夏墨迹测验研究的科学

Department of Psychology, University of Alaska Anchorage, USA.

J Pers Assess. 2000 Aug;75(1):46-81. doi: 10.1207/S15327752JPA7501_6.

Wood et al.'s (1999b) article contained several general points that are quite sound. Conducting research with an extreme groups design does produce effect sizes that are larger than those observed in an unselected population. Appropriate control groups are important for any study that wishes to shed light on the characteristics of a targeted experimental group and experimental validity is enhanced when researchers collect data from both groups simultaneously. Diagnostic efficiency statistics--or any summary measures of test validity--should be trusted more when they are drawn from multiple studies conducted by different investigators across numerous settings rather than from a single investigator's work. There should be no question that these points are correct. However, I have pointed out numerous problems with specific aspects of Wood et al.'s (1999b) article. Wood et al. gave improper citations that claimed researchers found or said things that they did not. Wood et al. indicated my data set did not support the incremental validity of the Rorschach over the MMPI-2 when, in fact, my study never reported such an analysis and my data actually reveal that the opposite conclusion is warranted. Wood et al. asserted there was only one proper way to conduct incremental validity analyses even though experts have described how their recommended procedure can lead to significant complications. Wood et al. cited a section of Cohen and Cohen (1983) to bolster their claim that hierarchical and step-wise regression procedures were incompatible and to criticize Burns and Viglione's (1996) regression analysis. However, that section of Cohen and Cohen's text actually contradicted Wood et al.'s argument. Wood et al. tried to convince readers that Burns and Viglione used improper alpha levels and drew improper conclusions from their regression data although Burns and Viglione had followed the research evidence on this topic and the expert recommendations provided in Hosmer and Lemeshow's (1989) classic text. Wood et al. oversimplified issues associated with extreme group research designs and erroneously suggested that diagnostic studies were immune from interpretive confounds that can be associated with this type of design. Wood et al. ignored or dismissed the valid reasons why Burns and Viglione used an extreme groups design, and they never mentioned how Burns and Viglione used a homogeneous sample that actually was likely to find smaller than normal effect sizes. Wood et al. also overlooked the fact that Burns and Viglione identified their results as applying to female nonpatients; they never suggested their findings would characterize those obtained from a clinical sample. Wood et al. criticized composite measures although some of the most important and classic findings in the history of research on personality recommend composite measures as a way to minimize error and maximize validity. Wood et al. also were mistaken about the elements that constitute an optimal composite measure. Wood et al. apparently ignored the factor-analytic evidence that demonstrated how Burns and Viglione created a reasonable composite scale, and Wood et al. similarly ignored the clear evidence that supported the content and criterion related validity of the EMRF. With respect to the HEV, Wood et al. created a z-score formula that used the wrong means and standard deviations. They continued to use this formula despite being informed that it was incorrect. Subsequently, Wood et al. told readers that their faulty z-score formula was "incompatible" with the proper weighted formula and asserted that the two formulas "do not yield identical results" and "do not yield HEV scores that are identical or even very close." These published claims were made even though Wood et al. had seen the results from eight large samples, all of which demonstrated that their wrong formula had correlations greater than .998 with the correct formula. At worst, it seems that Wood et al. (199

伍德等人（1999b）的文章包含了几个相当合理的总体观点。采用极端组设计进行研究确实会产生比在未经过筛选的总体中观察到的效应量更大的结果。对于任何希望阐明目标实验组特征的研究来说，合适的对照组都很重要，并且当研究人员同时从两组收集数据时，实验效度会得到提高。当诊断效率统计数据——或任何测试效度的汇总指标——是从不同研究者在众多环境中进行的多项研究中得出，而不是从单个研究者的工作中得出时，应该更值得信赖。毫无疑问，这些观点是正确的。然而，我已经指出了伍德等人（1999b）文章具体方面的众多问题。伍德等人给出了不当引用，声称研究人员发现或说了他们并未发现或说过的事情。伍德等人表示我的数据集不支持罗夏测验相对于明尼苏达多相人格调查表第二版（MMPI - 2）的增量效度，而实际上我的研究从未报告过这样的分析，并且我的数据实际上表明相反的结论是合理的。伍德等人断言进行增量效度分析只有一种正确方法，尽管专家们已经描述了他们推荐的程序如何会导致重大的复杂情况。伍德等人引用了科恩和科恩（1983）的一部分内容来支持他们关于层次回归和逐步回归程序不兼容的说法，并批评伯恩斯和维利奥内（1996）的回归分析。然而，科恩和科恩文本的那部分内容实际上与伍德等人的论点相矛盾。伍德等人试图让读者相信伯恩斯和维利奥内使用了不适当的α水平，并从他们的回归数据中得出了不适当的结论，尽管伯恩斯和维利奥内遵循了关于这个主题的研究证据以及霍斯默和莱梅肖（1989）经典文本中提供的专家建议。伍德等人过度简化了与极端组研究设计相关的问题，并错误地暗示诊断研究不受与这种设计相关的解释性混淆的影响。伍德等人忽视或驳回了伯恩斯和维利奥内使用极端组设计的合理原因，并且他们从未提及伯恩斯和维利奥内如何使用了一个实际上可能会发现比正常效应量更小的同质样本。伍德等人还忽略了这样一个事实，即伯恩斯和维利奥内明确表示他们的结果适用于女性非患者；他们从未暗示他们的发现会适用于从临床样本中获得的结果。伍德等人批评了综合测量方法，尽管人格研究史上一些最重要和经典的发现推荐使用综合测量方法作为减少误差和最大化效度的一种方式。伍德等人在构成最佳综合测量方法的要素方面也有误。伍德等人显然忽略了因子分析证据，该证据表明了伯恩斯和维利奥内是如何创建一个合理的综合量表的，并且伍德等人同样忽略了支持情绪管理与康复量表（EMRF）的内容效度和效标关联效度的明确证据。关于健康效应值（HEV），伍德等人创建了一个使用错误均值和标准差的z分数公式。尽管被告知该公式不正确，他们仍继续使用。随后，伍德等人告诉读者他们错误的z分数公式与正确的加权公式“不兼容”，并断言这两个公式“不会产生相同的结果”，“也不会产生相同甚至非常接近的健康效应值分数”。尽管伍德等人已经看到了八个大样本的结果，所有这些结果都表明他们错误的公式与正确的公式的相关性大于0.998，但他们还是发表了这些声明。在最坏的情况下，似乎伍德等人（199