Zhang Yu, Li Ming, Haas David M, Bairey Merz C Noel, Workalemahu Tsegaselassie, Ryckman Kelli, Catov Janet M, Levine Lisa D, Freedman Alexa, Saade George R, Li Xihao, Liu Nianjun, Yan Qi
Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington, Bloomington, IN 47405, USA.
Department of Obstetrics and Gynecology, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
medRxiv. 2025 Aug 24:2025.08.20.25334100. doi: 10.1101/2025.08.20.25334100.
Mendelian randomization (MR) has become an important technique for establishing causal relationships between risk factors and health outcomes. By using genetic variants as instrumental variables, it can mitigate bias due to confounding and reverse causation in observational studies. Current MR analyses have predominantly used common genetic variants as instruments, which represent only part of the genetic architecture of complex traits. Rare variants, which can have larger effect sizes and provide unique biological insights, have been understudied due to statistical and methodological challenges. We introduce MR-CARV, a novel framework integrating common and rare genetic variants in two-sample Mendelian randomization. This method leverages comprehensive genetic data made available by high-throughput sequencing technologies and large-scale consortia. Rare variants are aggregated into functional categories, such as gene-coding, gene-noncoding, and non-gene regions, by leveraging variant annotations and biological impact as weights. The effects of rare variant sets are then estimated with STAARpipeline and combined with the estimated effects of common variants by the existing MR methods. Simulation studies demonstrate that MR-CARV maintains robust type I error and achieves higher statistical power, with up to a 66.3% relative increase compared to existing methods only based on common variants. Consistent with these findings, application to real data on HDL-C and preeclampsia showed that MR-CARV(IVW) yielded a more precise and statistically significant effect estimate (-0.021, SE = 0.0101, P = 0.0365) than IVW using only common variants (-0.024, SE = 0.0123, P = 0.0538).
孟德尔随机化(MR)已成为确立风险因素与健康结果之间因果关系的一项重要技术。通过使用基因变异作为工具变量,它可以减轻观察性研究中因混杂因素和反向因果关系导致的偏差。目前的MR分析主要使用常见基因变异作为工具,而这些变异仅代表复杂性状遗传结构的一部分。由于统计和方法上的挑战,对效应量可能更大且能提供独特生物学见解的罕见变异的研究较少。我们引入了MR-CARV,这是一种在两样本孟德尔随机化中整合常见和罕见基因变异的新框架。该方法利用了高通量测序技术和大规模合作研究提供的全面遗传数据。通过将变异注释和生物学影响作为权重,将罕见变异聚集到功能类别中,如基因编码区、基因非编码区和非基因区域。然后使用STAARpipeline估计罕见变异集的效应,并通过现有的MR方法将其与常见变异的估计效应相结合。模拟研究表明,MR-CARV保持了稳健的I型错误率,并具有更高的统计效力,与仅基于常见变异的现有方法相比,相对效力提高了66.3%。与这些发现一致,将其应用于高密度脂蛋白胆固醇(HDL-C)和子痫前期的实际数据表明,与仅使用常见变异的逆方差加权法(IVW)相比,MR-CARV(IVW)产生了更精确且具有统计学意义的效应估计值(-0.021,标准误 = 0.0101,P = 0.0365),而IVW的效应估计值为(-0.024,标准误 = 0.0123,P = 0.0538)。