McGrouther Caroline C, Rangan Aaditya V, Di Florio Arianna, Elman Jeremy A, Schork Nicholas J, Kelsoe John
Courant Institute of Mathematical Sciences, New York University, New York, NY, United States of America.
School of Medicine, Division of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, United Kingdom.
ArXiv. 2024 Oct 27:arXiv:2405.00159v2.
Bipolar Disorder (BD) is a complex disease. It is heterogeneous, both at the phenotypic and genetic level, although the extent and impact of this heterogeneity is not fully understood. One way to assess this heterogeneity is to look for patterns in the subphenotype data. Because of the variability in how phenotypic data was collected by the various BD studies over the years, homogenizing this subphenotypic data is a challenging task, and so is replication. An alternative methodology, taken here, is to set aside the intricacies of subphenotype and allow the genetic data itself to determine which subjects define a homogeneous genetic subgroup (termed 'bicluster' below).
In this paper, we leverage recent advances in heterogeneity analysis to look for genetically-driven subgroups (i.e., biclusters) within the broad phenotype of Bipolar Disorder. We first apply this covariate-corrected biclustering algorithm to a cohort of 2524 BD cases and 4106 controls from the Bipolar Disease Research Network (BDRN) within the Psychiatric Genomics Consortium (PGC). We find evidence of genetic heterogeneity delineating a statistically significant bicluster comprising a subset of BD cases which exhibits a disease-specific pattern of differential-expression across a subset of SNPs. This disease-specific genetic pattern (i.e., 'genetic subgroup') replicates across the remaining data-sets collected by the PGC containing 5781/8289, 3581/7591, and 6825/9752 cases/controls, respectively. This genetic subgroup (discovered without using any BD subtype information) was more prevalent in Bipolar type-I than in Bipolar type-II.
Our methodology has successfully identified a replicable homogeneous genetic subgroup of bipolar disorder. This subgroup may represent a collection of correlated genetic risk-factors for BDI. By investigating the subgroup's bicluster-informed polygenic-risk-scoring (PRS), we find that the disease-specific pattern highlighted by the bicluster can be leveraged to eliminate noise from our GWAS analyses and improve risk prediction. This improvement is particularly notable when using only a relatively small subset of the available SNPs, implying improved SNP replication. Though our primary focus is only the analysis of disease-related signal, we also identify replicable control-related heterogeneity.
双相情感障碍(BD)是一种复杂的疾病。它在表型和遗传水平上都是异质性的,尽管这种异质性的程度和影响尚未完全明确。评估这种异质性的一种方法是在亚表型数据中寻找模式。由于多年来不同BD研究收集表型数据的方式存在差异,使这些亚表型数据同质化是一项具有挑战性的任务,重复验证同样如此。本文采用的另一种方法是抛开亚表型的复杂性,让遗传数据本身来确定哪些受试者构成一个同质的遗传亚组(以下称为“双聚类”)。
在本文中,我们利用异质性分析的最新进展,在双相情感障碍的广泛表型中寻找由遗传驱动的亚组(即双聚类)。我们首先将这种经协变量校正的双聚类算法应用于精神基因组学联盟(PGC)内双相情感障碍研究网络(BDRN)的2524例BD病例和4106例对照组成的队列。我们发现了遗传异质性的证据,确定了一个具有统计学意义的双聚类,该双聚类由一部分BD病例组成,这些病例在一部分单核苷酸多态性(SNP)上呈现出疾病特异性的差异表达模式。这种疾病特异性的遗传模式(即“遗传亚组”)在PGC收集的其余数据集中得到了重复验证,这些数据集分别包含5781/8289、3581/7591和6825/9752例病例/对照。这个遗传亚组(在未使用任何BD亚型信息的情况下发现)在双相I型中比在双相II型中更普遍。
我们的方法成功地识别出了一个可重复的双相情感障碍同质遗传亚组。这个亚组可能代表了双相I型相关遗传风险因素的集合。通过研究该亚组基于双聚类的多基因风险评分(PRS),我们发现双聚类突出的疾病特异性模式可用于消除全基因组关联研究(GWAS)分析中的噪声并改善风险预测。当仅使用可用SNP的相对较小子集时,这种改善尤为显著,这意味着SNP重复验证得到了改善。尽管我们主要关注的只是疾病相关信号的分析,但我们也识别出了可重复的对照相关异质性。