对有测量误差的二分自变量进行检验的功效。

Power of tests for a dichotomous independent variable measured with error.

作者信息

McCaffrey Daniel F, Elliott Marc N

机构信息

RAND Corporation, 4570 Fifth Avenue, Suite 600, Pittsburgh, PA 15213, USA.

出版信息

Health Serv Res. 2008 Jun;43(3):1085-101. doi: 10.1111/j.1475-6773.2007.00810.x.

Abstract

OBJECTIVE

To examine the implications for statistical power of using predicted probabilities for a dichotomous independent variable, rather than the actual variable.

DATA SOURCES/STUDY SETTING: An application uses 271,479 observations from the 2000 to 2002 CAHPS Medicare Fee-for-Service surveys. STUDY DESIGN AND DATA: A methodological study with simulation results and a substantive application to previously collected data.

PRINCIPLE FINDINGS

Researchers often must employ key dichotomous predictors that are unobserved but for which predictions exist. We consider three approaches to such data: the classification estimator (1); the direct substitution estimator (2); the partial information maximum likelihood estimator (3, PIMLE). The efficiency of (1) (its power relative to testing with the true variable) roughly scales with the square of one less the classification error. The efficiency of (2) roughly scales with the R(2) for predicting the unobserved dichotomous variable, and is usually more powerful than (1). Approach (3) is most powerful, but for testing differences in means of 0.2-0.5 standard deviations, (2) is typically more than 95 percent as efficient as (3).

CONCLUSIONS

The information loss from not observing actual values of dichotomous predictors can be quite large. Direct substitution is easy to implement and interpret and nearly as efficient as the PIMLE.

摘要

目的

研究使用二分自变量的预测概率而非实际变量对统计功效的影响。

数据来源/研究背景:一项应用使用了2000年至2002年CAHPS医疗保险按服务收费调查中的271,479条观测数据。

研究设计与数据

一项包含模拟结果及对先前收集数据进行实质性应用的方法学研究。

主要发现

研究人员常常必须采用关键的二分预测变量,这些变量虽无法观测到,但存在相应的预测值。我们考虑了三种处理此类数据的方法:分类估计器(1);直接替代估计器(2);部分信息最大似然估计器(3,PIMLE)。(1)的效率(其相对于使用真实变量进行检验的功效)大致与1减去分类误差后的平方成比例。(2)的效率大致与预测未观测到的二分变量的R²成比例,并且通常比(1)更有效。方法(3)最有效,但对于检验均值相差0.2 - 0.5个标准差的情况,(2)的效率通常是(3)的95%以上。

结论

不观测二分预测变量的实际值所导致的信息损失可能相当大。直接替代易于实施和解释,且效率几乎与部分信息最大似然估计器相当。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索