Sun Xianbang, Bulekova Katia, Yang Jian, Lai Meng, Pitsillides Achilleas N, Liu Xue, Zhang Yuankai, Guo Xiuqing, Yong Qian, Raffield Laura M, Rotter Jerome I, Rich Stephen S, Abecasis Goncalo, Carson April P, Vasan Ramachandran S, Bis Joshua C, Psaty Bruce M, Boerwinkle Eric, Fitzpatrick Annette L, Satizabal Claudia L, Arking Dan E, Ding Jun, Levy Daniel, Liu Chunyu
Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02118, USA.
Research Computing Services, Boston University, Boston, MA 02215, USA.
medRxiv. 2024 Jan 13:2024.01.12.24301233. doi: 10.1101/2024.01.12.24301233.
We rigorously assessed a comprehensive association testing framework for heteroplasmy, employing both simulated and real-world data. This framework employed a variant allele fraction (VAF) threshold and harnessed multiple gene-based tests for robust identification and association testing of heteroplasmy. Our simulation studies demonstrated that gene-based tests maintained an appropriate type I error rate at α=0.001. Notably, when 5% or more heteroplasmic variants within a target region were linked to an outcome, burden-extension tests (including the adaptive burden test, variable threshold burden test, and z-score weighting burden test) outperformed the sequence kernel association test (SKAT) and the original burden test. Applying this framework, we conducted association analyses on whole-blood derived heteroplasmy in 17,507 individuals of African and European ancestries (31% of African Ancestry, mean age of 62, with 58% women) with whole genome sequencing data. We performed both cohort- and ancestry-specific association analyses, followed by meta-analysis on both pooled samples and within each ancestry group. Our results suggest that mtDNA-encoded genes/regions are likely to exhibit varying rates in somatic aging, with the notably strong associations observed between heteroplasmy in the and genes (<0.001) and advance aging by the Original Burden test. In contrast, SKAT identified significant associations (<0.001) between diabetes and the aggregated effects of heteroplasmy in several protein-coding genes. Further research is warranted to validate these findings. In summary, our proposed statistical framework represents a valuable tool for facilitating association testing of heteroplasmy with disease traits in large human populations.
我们使用模拟数据和实际数据,严格评估了一个用于异质性的综合关联测试框架。该框架采用了变异等位基因分数(VAF)阈值,并利用多种基于基因的测试来进行异质性的稳健识别和关联测试。我们的模拟研究表明,基于基因的测试在α=0.001时保持了适当的I型错误率。值得注意的是,当目标区域内5%或更多的异质变异与一个结果相关联时,负担扩展测试(包括自适应负担测试、可变阈值负担测试和z分数加权负担测试)优于序列核关联测试(SKAT)和原始负担测试。应用这个框架,我们利用全基因组测序数据,对17507名非洲和欧洲血统个体(31%为非洲血统,平均年龄62岁,58%为女性)的全血衍生异质性进行了关联分析。我们进行了队列特异性和血统特异性的关联分析,随后对合并样本和每个血统组内的数据进行了荟萃分析。我们的结果表明,线粒体DNA编码的基因/区域在体细胞衰老中可能呈现不同的速率,通过原始负担测试观察到 和 基因中的异质性与衰老提前之间存在显著关联(<0.001)。相比之下,SKAT发现糖尿病与几个蛋白质编码基因中异质性的综合效应之间存在显著关联(<0.001)。有必要进行进一步的研究来验证这些发现。总之,我们提出的统计框架是促进在大量人群中进行异质性与疾病性状关联测试的宝贵工具。