Bogduk Nikolai
The University of Newcastle, PO Box 431, East Maitland, NSW, 2323, Australia.
Interv Pain Med. 2022 Aug 15;1(Suppl 2):100124. doi: 10.1016/j.inpm.2022.100124. eCollection 2022.
For professional practice to be responsible, any diagnostic tests used must be reliable. Therefore, the reliability of any diagnostic test needs to have been measured. The classical statistic for quantifying reliability is Kappa. Although Kappa can be promptly determined using a programmed calculator, using an algorithm to derive Kappa provides greater insight into what it is actually measuring and why. Kappa scores can be graded, with verbal descriptor applied to different grades. However, those descriptors do not necessarily reflect the degree of skill required to achieve different grades of Kappa. High levels of skill attract high Kappa scores, but Kappa scores described as fair or moderate are not necessarily flattering because they can be achieved with questionable levels of skill. Various corrections can be applied to the calculation of Kappa scores in order to raise their value, and to improve the verbal descriptors of their grade, but these may not be legitimate or necessary. Low Kappa scores do not condemn tests but they serve to raise questions about their reliability.
为使专业实践负责,所使用的任何诊断测试都必须可靠。因此,任何诊断测试的可靠性都需要进行测量。用于量化可靠性的经典统计量是卡帕值。虽然使用编程计算器可以迅速确定卡帕值,但使用算法推导卡帕值能更深入地了解其实际测量的内容以及原因。卡帕分数可以分级,并为不同级别应用文字描述。然而,这些描述不一定反映达到不同卡帕级别所需的技能程度。高水平的技能会获得高卡帕分数,但被描述为“尚可”或“中等”的卡帕分数不一定令人满意,因为以可疑的技能水平也可能获得这些分数。可以对卡帕分数的计算应用各种校正,以提高其值,并改善对其级别的文字描述,但这些校正可能不合理或没有必要。低卡帕分数并不意味着测试不可用,但它们会引发对其可靠性的质疑。