Tian Yuang, Zhang Hong, Bureau Alexandre, Hochner Hagit, Chen Jinbo
Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China.
Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, Anhui, China.
J Stat Plan Inference. 2024 Dec;233. doi: 10.1016/j.jspi.2024.106190. Epub 2024 May 9.
Parent-of-origin effect plays an important role in mammal development and disorder. Case-control mother-child pair genotype data can be used to detect parent-of-origin effect and is often convenient to collect in practice. Most existing methods for assessing parent-of-origin effect do not incorporate any covariates, which may be required to control for confounding factors. We propose to model the parent-of-origin effect through a logistic regression model, with predictors including maternal and child genotypes, parental origins, and covariates. The parental origins may not be fully inferred from genotypes of a target genetic marker, so we propose to use genotypes of markers tightly linked to the target marker to increase inference efficiency. A robust statistical inference procedure is developed based on a modified profile log-likelihood in a retrospective way. A computationally feasible expectation-maximization algorithm is devised to estimate all unknown parameters involved in the modified profile log-likelihood. This algorithm differs from the conventional expectation-maximization algorithm in the sense that it is based on a modified instead of the original profile log-likelihood function. The convergence of the algorithm is established under some mild regularity conditions. This expectation-maximization algorithm also allows convenient handling of missing child genotypes. Large sample properties, including weak consistency, asymptotic normality, and asymptotic efficiency, are established for the proposed estimator under some mild regularity conditions. Finite sample properties are evaluated through extensive simulation studies and the application to a real dataset.
亲本来源效应在哺乳动物发育和疾病中起着重要作用。病例对照母子对基因型数据可用于检测亲本来源效应,并且在实际中通常便于收集。大多数现有的评估亲本来源效应的方法没有纳入任何协变量,而协变量可能是控制混杂因素所必需的。我们建议通过逻辑回归模型对亲本来源效应进行建模,预测变量包括母亲和孩子的基因型、亲本来源以及协变量。亲本来源可能无法从目标遗传标记的基因型中完全推断出来,因此我们建议使用与目标标记紧密连锁的标记的基因型来提高推断效率。基于修正的轮廓对数似然以回顾性方式开发了一种稳健的统计推断程序。设计了一种计算上可行的期望最大化算法来估计修正的轮廓对数似然中涉及的所有未知参数。该算法与传统的期望最大化算法的不同之处在于它基于修正的而非原始的轮廓对数似然函数。在一些温和的正则条件下建立了算法的收敛性。这种期望最大化算法还便于处理缺失的孩子基因型。在一些温和的正则条件下为所提出的估计量建立了大样本性质,包括弱一致性、渐近正态性和渐近效率。通过广泛的模拟研究和对真实数据集的应用评估了有限样本性质。