Vanderwerff Brett, Pasternak Amy L, Fritsche Lars G, Bertucci-Richter Emily, Patil Snehal, Boehnke Michael, Zhou Xiang, Zöllner Sebastian, Hertz Daniel L, Zawistowski Matthew
Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
Department of Clinical Pharmacy, University of Michigan College of Pharmacy, University of Michigan, Ann Arbor, MI 48109, USA.
Genetics. 2025 Jul 9;230(3). doi: 10.1093/genetics/iyaf088.
Biobanks linking genetic data with clinical health records provide exciting opportunities for pharmacogenomic (PGx) research on genetic variation and drug response. Designed as central and multiuse resources, biobanks can facilitate diverse PGx research efforts, including the study of drug efficacy and adverse effects. Specialized PGx alleles and phenotypes are critical for such studies and can be conveniently called from existing array-based genotypes routinely collected in most biobanks. We describe a central callset of PGx alleles and phenotypes in over 80,000 participants of the Michigan Genomics Initiative (MGI) biobank, created using the PyPGx software on Trans-Omics for Precision Medicine-imputed genotypes. The array-based PGx allele calls demonstrate concordance (>92%) with a set of PCR-validated alleles collected during clinical care, but do not identify PGx alleles dependent on structural variation, including the clinically important CYP2D65 deletion. To address this, we developed a support vector machine trained on genotype array single nucleotide variant probe intensities to classify CYP2D65 carriers. This method had >99% accuracy and reclassified ∼7% of African American and ∼4% of White MGI participants to lower activity metabolizer phenotypes, predicting higher risks of adverse drug reactions. We demonstrate that central PGx callsets created with existing tools and genetic data can be augmented by customized calls for challenging alleles based on structural variants to broaden the research potential and clinical utility of biobanks. These PGx callsets can be created in biobanks with existing array-based genotype data and highlight the utility of advanced computational methods in PGx allele identification.
将基因数据与临床健康记录相连接的生物样本库为药物基因组学(PGx)研究基因变异和药物反应提供了令人兴奋的机会。生物样本库被设计为核心且多用途的资源,能够促进各种PGx研究工作,包括药物疗效和不良反应的研究。专门的PGx等位基因和表型对于此类研究至关重要,并且可以从大多数生物样本库常规收集的基于阵列的现有基因型中方便地调用。我们描述了密歇根基因组计划(MGI)生物样本库中超过80000名参与者的PGx等位基因和表型的核心调用集,该调用集是使用PyPGx软件基于精准医学全基因组推测基因型创建的。基于阵列的PGx等位基因调用与临床护理期间收集的一组经PCR验证的等位基因显示出一致性(>92%),但无法识别依赖于结构变异的PGx等位基因,包括临床上重要的CYP2D65缺失。为了解决这个问题,我们开发了一种支持向量机,它基于基因型阵列单核苷酸变异探针强度进行训练,以对CYP2D65携带者进行分类。该方法的准确率>99%,并将约7%的非裔美国人和约4%的白人MGI参与者重新分类为低活性代谢表型,预测药物不良反应风险更高。我们证明,利用现有工具和基因数据创建的核心PGx调用集可以通过基于结构变异对具有挑战性的等位基因进行定制调用得到增强,从而扩大生物样本库的研究潜力和临床实用性。这些PGx调用集可以在拥有基于阵列的现有基因型数据的生物样本库中创建,并突出了先进计算方法在PGx等位基因识别中的实用性。