Magder L S, Sloan M A, Duh S H, Abate J F, Kittner S J
Department of Epidemiology and Preventive Medicine, University of Maryland School of Medicine Baltimore, MD 21201, USA.
Stat Med. 2000 Jan 15;19(1):99-111. doi: 10.1002/(sici)1097-0258(20000115)19:1<99::aid-sim327>3.0.co;2-o.
Often, in biomedical research, there are multiple sources of imperfect information regarding a dichotomous variable of interest. For example, in a study we are conducting on the relationship between cocaine use and stroke risk, information on the cocaine use of each study patient is available from three fallible sources: patient interviews; urine toxicology testing, and medical record review. Regression analyses based on a rule for classifying patients from this information can result in biased estimation of associations and variances due to the misclassification of some subjects and to the assumption of certainty. We describe a likelihood-based method that directly incorporates multiple sources of information regarding an outcome variable into a regression analysis and takes into account the uncertainty in the classification. The method can be applied when some sources of information are missing for some subjects. We show how the availability of multiple sources can be exploited to generate estimates of the quality (for example, sensitivity and specificity) of each source and to model the degree to which missing data are informative. A fitting algorithm and issues of identifiability are discussed. We illustrate the method using data from our study.