Division of Mathematical Science, Graduate School of Engineering Science, Osaka University, Osaka, Japan; ONO Pharmaceutical co.,ltd., Osaka, Japan.
Stat Med. 2013 Dec 20;32(29):5091-105. doi: 10.1002/sim.5962. Epub 2013 Aug 29.
Medical diagnostic tests must enjoy appropriate validity and high reliability in order to qualify as adequate assessment tools. Without a gold standard test, available medical diagnostic tests are not perfect; hence, the reliability of such tests must be evaluated precisely. Kappa coefficient statistics are often utilized to assess reliability of tests when there are two or more medical diagnostic tests. However, the statistics are imprecise for a typical case when the prevalence rate of a target disease is unknown. Although latent class models could be used to assess reliability, the models cannot estimate reliability in the case of two tests, due to unidentifiability or the lack of degrees of freedom. An alternative approach to assess reliability for the case of two tests is stratifying a two-by-two contingency table under the assumption that sensitivities and specificities between the two tests be equal over all strata and that prevalence rates in the strata be different from each other. Because stratification is basically a multi-sample analysis, it should not be applied to the situation where subsamples (i.e., centers) are randomly selected from a larger population. In this article, a type of mixed-effect model is proposed to evaluate the reliability of two tests for trials in randomly selected multiple centers. Several types of distributions for prevalence rates over subpopulations are considered. Simulation studies show that our proposed method performs nicely. Analysis of real data is also reported.
为了成为充分的评估工具,医学诊断测试必须具有适当的有效性和高可靠性。如果没有金标准测试,现有的医学诊断测试就不完美;因此,必须准确评估此类测试的可靠性。当有两个或更多医学诊断测试时,通常使用 Kappa 系数统计来评估测试的可靠性。然而,对于目标疾病患病率未知的典型情况,统计数据并不精确。尽管可以使用潜在类别模型来评估可靠性,但由于无法识别或缺乏自由度,这些模型无法在两种测试的情况下估计可靠性。对于两种测试的情况,评估可靠性的另一种方法是在假设两个测试之间的敏感性和特异性在所有层上相等且层中的患病率彼此不同的情况下,对 2×2 列联表进行分层。由于分层基本上是一种多样本分析,因此不应将其应用于从较大人群中随机选择子样本(即中心)的情况。本文提出了一种混合效应模型,用于评估随机选择多个中心的试验中两种测试的可靠性。考虑了子总体中患病率的多种分布类型。模拟研究表明,我们提出的方法表现良好。还报告了对真实数据的分析。