Lee DongHyuk, Lahiri Soumendra N, Sinha Samiran
Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland.
Department of Statistics, North Carolina State University, Raleigh, North Carolina.
Biometrics. 2020 Sep;76(3):821-833. doi: 10.1111/biom.13207. Epub 2020 Jan 6.
When the observed data are contaminated with errors, the standard two-sample testing approaches that ignore measurement errors may produce misleading results, including a higher type-I error rate than the nominal level. To tackle this inconsistency, a nonparametric test is proposed for testing equality of two distributions when the observed contaminated data follow the classical additive measurement error model. The proposed test takes into account the presence of errors in the observed data, and the test statistic is defined in terms of the (deconvoluted) characteristic functions of the latent variables. Proposed method is applicable to a wide range of scenarios as no parametric restrictions are imposed either on the distribution of the underlying latent variables or on the distribution of the measurement errors. Asymptotic null distribution of the test statistic is derived, which is given by an integral of a squared Gaussian process with a complicated covariance structure. For data-based calibration of the test, a new nonparametric Bootstrap method is developed under the two-sample measurement error framework and its validity is established. Finite sample performance of the proposed test is investigated through simulation studies, and the results show superior performance of the proposed method than the standard tests that exhibit inconsistent behavior. Finally, the proposed method was applied to real data sets from the National Health and Nutrition Examination Survey. An R package MEtest is available through CRAN.
当观测数据存在误差污染时,忽略测量误差的标准双样本检验方法可能会产生误导性结果,包括高于名义水平的第一类错误率。为了解决这种不一致性,本文提出了一种非参数检验方法,用于在观测的污染数据遵循经典加性测量误差模型时检验两个分布的相等性。所提出的检验考虑了观测数据中误差的存在,并且检验统计量是根据潜在变量的(去卷积)特征函数定义的。由于对潜在变量的分布或测量误差的分布均未施加参数限制,因此所提出的方法适用于广泛的场景。推导了检验统计量的渐近零分布,它由具有复杂协方差结构的平方高斯过程的积分给出。为了基于数据对检验进行校准,在双样本测量误差框架下开发了一种新的非参数自助法,并确定了其有效性。通过模拟研究考察了所提出检验的有限样本性能,结果表明所提出的方法比表现出不一致行为的标准检验具有更好的性能。最后,将所提出的方法应用于来自美国国家健康与营养检查调查的真实数据集。可通过CRAN获取一个R包MEtest。