Qiu Shi-Fang, He Jie, Tao Ji-Ran, Tang Man-Lai, Poon Wai-Yin
Department of Statistics, Chongqing University of Technology, Chongqing, People's Republic of China.
School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, People's Republic of China.
J Appl Stat. 2019 Oct 17;47(8):1375-1401. doi: 10.1080/02664763.2019.1679727. eCollection 2020.
A disease prevalence can be estimated by classifying subjects according to whether they have the disease. When gold-standard tests are too expensive to be applied to all subjects, partially validated data can be obtained by double-sampling in which all individuals are classified by a fallible classifier, and some of individuals are validated by the gold-standard classifier. However, it could happen in practice that such infallible classifier does not available. In this article, we consider two models in which both classifiers are fallible and propose four asymptotic test procedures for comparing disease prevalence in two groups. Corresponding sample size formulae and validated ratio given the total sample sizes are also derived and evaluated. Simulation results show that (i) Score test performs well and the corresponding sample size formula is also accurate in terms of the empirical power and size in two models; (ii) the Wald test based on the variance estimator with parameters estimated under the null hypothesis outperforms the others even under small sample sizes in Model II, and the sample size estimated by this test is also accurate; (iii) the estimated validated ratios based on all tests are accurate. The malarial data are used to illustrate the proposed methodologies.
疾病患病率可通过根据受试者是否患有该疾病进行分类来估计。当金标准检测过于昂贵而无法应用于所有受试者时,可以通过双重抽样获得部分验证数据,即在双重抽样中,所有个体都由一个易出错的分类器进行分类,而一些个体则由金标准分类器进行验证。然而,在实际中可能会出现这样的情况,即不存在这样一个绝对可靠的分类器。在本文中,我们考虑了两种分类器都易出错的模型,并提出了四种用于比较两组疾病患病率的渐近检验程序。还推导并评估了给定总样本量时相应的样本量公式和验证比例。模拟结果表明:(i)得分检验表现良好,并且在两种模型中,就经验功效和检验规模而言,相应的样本量公式也很准确;(ii)基于在原假设下估计参数的方差估计量的Wald检验,即使在模型II的小样本量情况下也优于其他检验,并且通过该检验估计的样本量也很准确;(iii)基于所有检验估计的验证比例是准确的。疟疾数据用于说明所提出的方法。