Crossley Jim, Russell Jean, Jolly Brian, Ricketts Chris, Roberts Chris, Schuwirth Lambert, Norcini John
Academic Unit of Medical Education, University of Sheffield, Sheffield, UK.
Med Educ. 2007 Oct;41(10):926-34. doi: 10.1111/j.1365-2923.2007.02843.x.
Investigators applying generalisability theory to educational research and evaluation have sometimes done so poorly. The main difficulties have related to: inadequate or non-random sampling of effects, dealing with naturalistic data, and interpreting and presenting variance components.
This paper addresses these areas of difficulty, and articulates an informal consensus amongst medical educators from Europe, Australia and the USA, who are familiar with generalisability theory.
We make the following recommendations. Ensure that all relevant factors are sampled, and that the sampling meets the theory's assumption that the conditions represent a random and representative sample of the factor's 'universe'. Research evaluations will require large samples of each factor if they are to generalise adequately. Where feasible, conduct 2 separate studies (pilot and evaluation, or Generalisability and Decision studies). For unbalanced data, use either urgenova, or 1 of the procedures minimum norm quadratic unbiased estimator, (minque), maximum likelihood (ml) or restricted maximum likelihood (reml) in spss or sas if the data are too complex. State which mathematical procedure was used and the degrees of freedom (d.f.) of the effect estimates. If the procedure does not report d.f., re-analyse with type III sum of squares anova (anova ss III) and report these d.f. Describe and justify the regression model used. Present the raw variance components. Describe the effects that they represent in plain, non-statistical language. If standard error of measurement (SEM) or Reliability coefficients are presented, give the equations used to calculate them. Make sure that the method of reporting reliability (precision or discrimination) is appropriate to the purpose of the assessment. This will usually demand a precision indicator such as SEM. Consider a graphical presentation to combine precision and discrimination.
将概化理论应用于教育研究与评估的研究者有时做得很糟糕。主要困难涉及:效应的抽样不足或不随机、处理自然主义数据以及解释和呈现方差分量。
本文探讨了这些困难领域,并阐明了来自欧洲、澳大利亚和美国的熟悉概化理论的医学教育工作者之间的非正式共识。
我们提出以下建议。确保对所有相关因素进行抽样,并且抽样符合该理论的假设,即这些条件代表该因素“总体”的随机且有代表性的样本。如果研究评估要进行充分的概化,则需要对每个因素进行大量抽样。在可行的情况下,进行两项单独的研究(试点研究和评估研究,或概化研究和决策研究)。对于不平衡数据,如果数据过于复杂,可使用urgenova,或在SPSS或SAS中使用最小范数二次无偏估计器(minque)、最大似然(ml)或限制最大似然(reml)程序中的一种。说明使用了哪种数学程序以及效应估计的自由度(d.f.)。如果该程序未报告d.f.,则用III型平方和方差分析(anova ss III)重新分析并报告这些d.f.。描述并说明所使用的回归模型。呈现原始方差分量。用通俗易懂的非统计语言描述它们所代表的效应。如果给出了测量标准误(SEM)或可靠性系数,请给出用于计算它们的方程。确保报告可靠性(精度或区分度)的方法与评估目的相适应。这通常需要一个精度指标,如SEM。考虑用图形展示来结合精度和区分度。