Prescott Gordon J, Garthwaite Paul H
Department of Public Health, University of Aberdeen, Aberdeen AB25 2ZD, UK.
Stat Med. 2005 Feb 15;24(3):379-401. doi: 10.1002/sim.2000.
Bayesian methods are proposed for analysing matched case-control studies in which a binary exposure variable is sometimes measured with error, but whose correct values have been validated for a random sample of the matched case-control sets. Three models are considered. Model 1 makes few assumptions other than randomness and independence between matched sets, while Models 2 and 3 are logistic models, with Model 3 making additional distributional assumptions about the variation between matched sets. With Models 1 and 2 the data are examined in two stages. The first stage analyses data from the validation sample and is easy to perform; the second stage analyses the main body of data and requires MCMC methods. All relevant information is transferred between the stages by using the posterior distributions from the first stage as the prior distributions for the second stage. With Model 3, a hierarchical structure is used to model the relationship between the exposure probabilities of the matched sets, which gives the potential to extract more information from the data. All the methods that are proposed are generalized to studies in which there is more than one control for each case. The Bayesian methods and a maximum likelihood method are applied to a data set for which the exposure of every patient was measured using both an imperfect measure that is subject to misclassification, and a much better measure whose classifications may be treated as correct. To test methods, the latter information was suppressed for all but a random sample of matched sets.
本文提出了贝叶斯方法,用于分析匹配病例对照研究。在这类研究中,二元暴露变量有时会被错误测量,但其正确值已在匹配病例对照集的随机样本中得到验证。文中考虑了三种模型。模型1除了匹配集之间的随机性和独立性外,几乎没有做其他假设,而模型2和模型3是逻辑模型,模型3对匹配集之间的变异做了额外的分布假设。对于模型1和模型2,数据分两个阶段进行分析。第一阶段分析验证样本的数据,这很容易执行;第二阶段分析数据主体,需要使用MCMC方法。通过将第一阶段的后验分布用作第二阶段的先验分布,所有相关信息在两个阶段之间传递。对于模型3,使用层次结构来对匹配集的暴露概率之间的关系进行建模,这使得从数据中提取更多信息成为可能。所提出的所有方法都被推广到每个病例有多个对照的研究中。贝叶斯方法和最大似然方法被应用于一个数据集,在该数据集中,每个患者的暴露情况使用了两种测量方法,一种是容易出现错误分类的不完美测量方法,另一种是分类可视为正确的更好的测量方法。为了测试方法,除了匹配集的随机样本外,后者的信息都被抑制了。