Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Maryland, USA.
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada.
Stat Med. 2022 Jun 30;41(14):2513-2522. doi: 10.1002/sim.9367. Epub 2022 Mar 7.
It is challenging to evaluate the genetic impacts on a biologic feature and separate them from environmental impacts. This is usually achieved through twin studies by assessing the collective genetic impact defined by the differential correlation in monozygotic twins vs dizygotic twins. Since the underlying order in a twin, determined by latent genetic factors, is unknown, the observed twin data are unordered. Conventional methods for correlation are not appropriate. To handle the missing order, we model twin data by a mixture bivariate distribution and estimate under two likelihood functions: the likelihood over the monozygotic and dizygotic twins separately, and the likelihood over the two twin types combined. Both likelihood estimators are consistent. More importantly, the combined likelihood overcomes the drawback of mixture distribution estimation, namely, the slow convergence. It yields correlation coefficient estimator of root-n consistency and allows effective statistical inference on the collective genetic impact. The method is demonstrated by a twin study on immune traits.
评估遗传对生物特征的影响并将其与环境影响分开具有挑战性。这通常通过双胞胎研究来实现,通过评估同卵双胞胎与异卵双胞胎之间差异相关性来定义遗传的综合影响。由于双胞胎中潜在的遗传因素决定的顺序是未知的,因此观察到的双胞胎数据是无序的。传统的相关系数方法并不适用。为了处理缺失的顺序,我们通过混合双变量分布对双胞胎数据进行建模,并在两个似然函数下进行估计:分别对同卵双胞胎和异卵双胞胎的似然函数,以及对两种双胞胎类型的组合的似然函数。这两个似然函数估计量都是一致的。更重要的是,联合似然函数克服了混合分布估计的缺点,即收敛速度慢。它产生了根 - n 一致性的相关系数估计量,并允许对遗传的综合影响进行有效的统计推断。该方法通过对免疫特征的双胞胎研究进行了演示。