From the Department of Surgery and Perioperative Care, Dell Medical School at the University of Texas at Austin, Austin, Texas.
Department of Anesthesiology, VU University Medical Center, Amsterdam, the Netherlands.
Anesth Analg. 2018 Jun;126(6):2123-2128. doi: 10.1213/ANE.0000000000002924.
Correlation and agreement are 2 concepts that are widely applied in the medical literature and clinical practice to assess for the presence and strength of an association. However, because correlation and agreement are conceptually distinct, they require the use of different statistics. Agreement is a concept that is closely related to but fundamentally different from and often confused with correlation. The idea of agreement refers to the notion of reproducibility of clinical evaluations or biomedical measurements. The intraclass correlation coefficient is a commonly applied measure of agreement for continuous data. The intraclass correlation coefficient can be validly applied specifically to assess intrarater reliability and interrater reliability. As its name implies, the Lin concordance correlation coefficient is another measure of agreement or concordance. In undertaking a comparison of a new measurement technique with an established one, it is necessary to determine whether they agree sufficiently for the new to replace the old. Bland and Altman demonstrated that using a correlation coefficient is not appropriate for assessing the interchangeability of 2 such measurement methods. They in turn described an alternative approach, the since widely applied graphical Bland-Altman Plot, which is based on a simple estimation of the mean and standard deviation of differences between measurements by the 2 methods. In reading a medical journal article that includes the interpretation of diagnostic tests and application of diagnostic criteria, attention is conventionally focused on aspects like sensitivity, specificity, predictive values, and likelihood ratios. However, if the clinicians who interpret the test cannot agree on its interpretation and resulting typically dichotomous or binary diagnosis, the test results will be of little practical use. Such agreement between observers (interobserver agreement) about a dichotomous or binary variable is often reported as the kappa statistic. Assessing the interrater agreement between observers, in the case of ordinal variables and data, also has important biomedical applicability. Typically, this situation calls for use of the Cohen weighted kappa. Questionnaires, psychometric scales, and diagnostic tests are widespread and increasingly used by not only researchers but also clinicians in their daily practice. It is essential that these questionnaires, scales, and diagnostic tests have a high degree of agreement between observers. It is therefore vital that biomedical researchers and clinicians apply the appropriate statistical measures of agreement to assess the reproducibility and quality of these measurement instruments and decision-making processes.
相关性和一致性是在医学文献和临床实践中广泛应用于评估关联存在和强度的两个概念。然而,由于相关性和一致性在概念上是不同的,因此需要使用不同的统计方法。一致性是一个与相关性密切相关但在本质上不同且经常混淆的概念。一致性的概念是指临床评估或生物医学测量的可重复性。组内相关系数是一种常用的连续数据一致性度量。组内相关系数可以有效地应用于评估内部评估者可靠性和外部评估者可靠性。顾名思义,Lin 一致性相关系数是另一种一致性或一致性度量。在将新的测量技术与已建立的技术进行比较时,有必要确定它们是否足够一致,以便新的技术可以替代旧的技术。Bland 和 Altman 表明,使用相关系数不适合评估两种此类测量方法的可互换性。他们反过来描述了一种替代方法,即广泛应用的图形 Bland-Altman 图,该图基于两种方法测量值之间的均值和标准差的简单估计。在阅读包含诊断测试解释和诊断标准应用的医学期刊文章时,传统上注意力集中在敏感性、特异性、预测值和似然比等方面。但是,如果解释测试的临床医生无法就其解释以及由此产生的通常是二分法或二进制诊断达成一致,那么测试结果将几乎没有实际用途。观察者之间关于二分法或二进制变量的这种一致性(观察者间一致性)通常报告为kappa 统计量。在有序变量和数据的情况下,评估观察者之间的评分者间一致性也具有重要的生物医学适用性。通常,这种情况需要使用 Cohen 加权 kappa。问卷、心理计量量表和诊断测试不仅被研究人员而且被临床医生在日常实践中广泛使用且越来越多地使用。这些问卷、量表和诊断测试在观察者之间具有高度的一致性是至关重要的。因此,生物医学研究人员和临床医生应用适当的一致性统计措施来评估这些测量仪器和决策过程的可重复性和质量至关重要。