Wang Qianqian, Ma Yanyuan, Wang Yuanjia
University of South Carolina, Penn State University and Columbia University.
Stat Sin. 2017 Oct;27(4):1857-1878. doi: 10.5705/ss.202016.0199.
Some biomedical studies lead to mixture data. When a discrete covariate defining subgroup membership is missing for some of the subjects in a study, the distribution of the outcome follows a mixture distribution of the subgroup-specific distributions. Taking into account the uncertain distribution of the group membership and the covariates, we model the relation between the disease onset time and the covariates through transformation models in each sub-population, and develop a nonparametric maximum likelihood based estimation implemented through EM algorithm along with its inference procedure. We further propose methods to identify the covariates that have different effects or common effects in distinct populations, which enables parsimonious modeling and better understanding of the difference across populations. The methods are illustrated through extensive simulation studies and a real data example.
一些生物医学研究产生混合数据。当研究中的某些受试者缺少定义亚组归属的离散协变量时,结果的分布遵循亚组特定分布的混合分布。考虑到组成员身份和协变量的不确定分布,我们通过每个亚群体中的变换模型对疾病发病时间与协变量之间的关系进行建模,并开发了一种基于非参数最大似然的估计方法,该方法通过期望最大化(EM)算法及其推断程序来实现。我们进一步提出了识别在不同群体中具有不同效应或共同效应的协变量的方法,这有助于进行简约建模并更好地理解不同群体之间的差异。通过大量的模拟研究和一个实际数据示例对这些方法进行了说明。