Liao Jason J Z
Int J Biostat. 2015 May;11(1):125-33. doi: 10.1515/ijb-2014-0030.
In medical and other related sciences, clinical or experimental measurements usually serve as a basis for diagnostic, prognostic, therapeutic, and performance evaluations. Examples can be assessing the reliability of multiple raters (or measurement methods), assessing the suitability for tumor evaluation of using a local laboratory or a central laboratory in a randomized clinical trial (RCT), validating surrogate endpoints in a study, determining that the important outcome measurements are interchangeable among the evaluators in an RCT. Any elegant study design cannot overcome the damage by unreliable measurement. Many methods have been developed to assess the agreement of two measurement methods. However, there is little attention to quantify how good the agreement of two measurement methods is. In this paper, similar to the type I error and the power in describing a hypothesis testing, we propose quantifying an agreement assessment using two rates: the discordance rate and the tolerance probability. This approach is demonstrated through examples.
在医学及其他相关科学领域,临床或实验测量通常作为诊断、预后、治疗及性能评估的基础。例如,评估多个评估者(或测量方法)的可靠性,在随机临床试验(RCT)中评估使用当地实验室或中心实验室进行肿瘤评估的适用性,在研究中验证替代终点,确定在RCT中重要的结局测量在评估者之间是否可互换。任何精巧的研究设计都无法克服不可靠测量造成的损害。已经开发了许多方法来评估两种测量方法的一致性。然而,很少有人关注量化两种测量方法的一致性有多好。在本文中,类似于描述假设检验时的I型错误和检验效能,我们建议使用两个比率来量化一致性评估:不一致率和容忍概率。通过实例展示了这种方法。