Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.
Department of Biostatistics, MD Anderson Cancer Center, Houston, Texas.
Biometrics. 2023 Jun;79(2):1472-1484. doi: 10.1111/biom.13629. Epub 2022 Mar 22.
Sample sizes vary substantially across tissues in the Genotype-Tissue Expression (GTEx) project, where considerably fewer samples are available from certain inaccessible tissues, such as the substantia nigra (SSN), than from accessible tissues, such as blood. This severely limits power for identifying tissue-specific expression quantitative trait loci (eQTL) in undersampled tissues. Here we propose Surrogate Phenotype Regression Analysis (Spray) for leveraging information from a correlated surrogate outcome (eg, expression in blood) to improve inference on a partially missing target outcome (eg, expression in SSN). Rather than regarding the surrogate outcome as a proxy for the target outcome, Spray jointly models the target and surrogate outcomes within a bivariate regression framework. Unobserved values of either outcome are treated as missing data. We describe and implement an expectation conditional maximization algorithm for performing estimation in the presence of bilateral outcome missingness. Spray estimates the same association parameter estimated by standard eQTL mapping and controls the type I error even when the target and surrogate outcomes are truly uncorrelated. We demonstrate analytically and empirically, using simulations and GTEx data, that in comparison with marginally modeling the target outcome, jointly modeling the target and surrogate outcomes increases estimation precision and improves power.
样本量在基因-组织表达(GTEx)项目中的各个组织之间存在很大差异,某些难以获取的组织(如黑质(SSN))的样本数量明显少于可获取的组织(如血液)。这严重限制了在采样不足的组织中识别组织特异性表达数量性状基因座(eQTL)的能力。在这里,我们提出了替代表型回归分析(Spray),以利用相关替代结果(例如血液中的表达)的信息来改善对部分缺失目标结果(例如 SSN 中的表达)的推断。Spray 不是将替代结果视为目标结果的代理,而是在双变量回归框架内联合建模目标和替代结果。两个结果中任何一个的未观察值都被视为缺失数据。我们描述并实施了一种期望条件最大化算法,用于在双侧结果缺失的情况下进行估计。Spray 估计了标准 eQTL 映射所估计的相同关联参数,即使目标和替代结果实际上是不相关的,也能控制Ⅰ型错误。我们使用模拟和 GTEx 数据进行了分析和经验验证,结果表明,与边缘建模目标结果相比,联合建模目标和替代结果可以提高估计精度并提高功效。