Department of Epidemiology and Biostatistics, McGill University, 1020 Pine Avenue West, Montreal, Quebec, H3A 1A2, Canada.
Stat Med. 2011 Sep 20;30(21):2648-62. doi: 10.1002/sim.4320. Epub 2011 Jul 22.
There is now a large literature on the analysis of diagnostic test data. In the absence of a gold standard test, latent class analysis is most often used to estimate the prevalence of the condition of interest and the properties of the diagnostic tests. When test results are measured on a continuous scale, both parametric and nonparametric models have been proposed. Parametric methods such as the commonly used bi-normal model may not fit the data well; nonparametric methods developed to date have been relatively complex to apply in practice, and their properties have not been carefully evaluated in the diagnostic testing context. In this paper, we propose a simple yet flexible Bayesian nonparametric model which approximates a Dirichlet process for continuous data. We compare results from the nonparametric model with those from the bi-normal model via simulations, investigating both how much is lost in using a nonparametric model when the bi-normal model is correct and how much can be gained in using a nonparametric model when normality does not hold. We also carefully investigate the trade-offs that occur between flexibility and identifiability of the model as different Dirichlet process prior distributions are used. Motivated by an application to tuberculosis clustering, we extend our nonparametric model to accommodate two additional dichotomous tests and proceed to analyze these data using both the continuous test alone as well as all three tests together.
目前已有大量关于诊断测试数据的分析文献。在缺乏金标准测试的情况下,通常使用潜在类别分析来估计感兴趣的疾病的患病率以及诊断测试的特性。当测试结果以连续尺度测量时,已经提出了参数和非参数模型。常用的双正态模型等参数方法可能不适用于数据;迄今为止开发的非参数方法在实践中应用相对复杂,并且它们在诊断测试环境中的特性尚未得到仔细评估。在本文中,我们提出了一种简单而灵活的贝叶斯非参数模型,该模型可用于连续数据的狄利克雷过程逼近。我们通过模拟比较了非参数模型和双正态模型的结果,同时研究了当双正态模型正确时使用非参数模型会损失多少,以及当正态性不成立时使用非参数模型会获得多少。我们还仔细研究了在使用不同狄利克雷过程先验分布时模型的灵活性和可识别性之间的权衡。受结核病聚类应用的启发,我们将我们的非参数模型扩展到可以适应两个额外的二项式测试,并使用单独的连续测试以及所有三个测试一起对这些数据进行分析。