Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA.
Research Computing Services, Boston University, Boston, MA 02215, USA.
Mitochondrion. 2024 Nov;79:101954. doi: 10.1016/j.mito.2024.101954. Epub 2024 Sep 7.
We rigorously assessed a comprehensive association testing framework for heteroplasmy, employing both simulated and real-world data. This framework employed a variant allele fraction (VAF) threshold and harnessed multiple gene-based tests for robust identification and association testing of heteroplasmy. Our simulation studies demonstrated that gene-based tests maintained an appropriate type I error rate at α = 0.001. Notably, when 5 % or more heteroplasmic variants within a target region were linked to an outcome, burden-extension tests (including the adaptive burden test, variable threshold burden test, and z-score weighting burden test) outperformed the sequence kernel association test (SKAT) and the original burden test. Applying this framework, we conducted association analyses on whole-blood derived heteroplasmy in 17,507 individuals of African and European ancestries (31 % of African Ancestry, mean age of 62, with 58 % women) with whole genome sequencing data. We performed both cohort- and ancestry-specific association analyses, followed by meta-analysis on both pooled samples and within each ancestry group. Our results suggest that mtDNA-encoded genes/regions are likely to exhibit varying rates in somatic aging, with the notably strong associations observed between heteroplasmy in the RNR1 and RNR2 genes (p < 0.001) and advance aging by the Original Burden test. In contrast, SKAT identified significant associations (p < 0.001) between diabetes and the aggregated effects of heteroplasmy in several protein-coding genes. Further research is warranted to validate these findings. In summary, our proposed statistical framework represents a valuable tool for facilitating association testing of heteroplasmy with disease traits in large human populations.
我们采用模拟数据和真实世界数据,严格评估了异质体的综合关联测试框架。该框架采用了变异等位基因分数(VAF)阈值,并利用了多个基于基因的测试来稳健地识别和关联测试异质体。我们的模拟研究表明,基于基因的测试在 α = 0.001 时保持了适当的Ⅰ型错误率。值得注意的是,当目标区域内 5%或更多的异质体变异与结果相关联时,负担扩展测试(包括自适应负担测试、可变阈值负担测试和 z 分数加权负担测试)优于序列核关联测试(SKAT)和原始负担测试。应用该框架,我们对来自非洲和欧洲血统的 17507 个人的全血衍生异质体进行了关联分析(31%为非洲血统,平均年龄为 62 岁,女性占 58%),这些人有全基因组测序数据。我们进行了队列和血统特异性关联分析,然后对 pooled 样本和每个血统组内进行了荟萃分析。我们的结果表明,mtDNA 编码基因/区域可能表现出不同的体细胞衰老速度,RNR1 和 RNR2 基因中的异质体之间观察到明显的强关联(p < 0.001),并且原始负担测试表明衰老提前。相比之下,SKAT 确定了糖尿病与几个编码蛋白基因中的异质体聚合效应之间的显著关联(p < 0.001)。需要进一步的研究来验证这些发现。总之,我们提出的统计框架代表了在大型人群中促进异质体与疾病特征关联测试的有价值工具。