Sardell J M, Das S, Møller G L, Sanna M, Chocian K, Taylor K, Malinowski A R, Stubberfield C, Rochlin A, Gardner S
PrecisionLife Ltd., Unit 8b Bankside, Hanborough Business Park, Long Hanborough OX29 8LJ, UK.
Complex Disorders Alliance, 2299 Summer St., Stamford, CT 06905, USA.
medRxiv. 2025 Aug 15:2025.08.13.25333595. doi: 10.1101/2025.08.13.25333595.
Endometriosis affects about 10% of women usually of reproductive age. It often has severe negative impacts on patients' quality of life, but the average time to a definitive diagnosis remains 7-9 years, and there are few effective therapeutic options. Relatively little is known about the genetic drivers of the disease even though heritability of the disease is fairly high. A recent large genome wide association study (GWAS) meta-analysis identified 42 genomic loci associated with risk of endometriosis, but together these explain only 5% of disease variance.
We used the PrecisionLife combinatorial analytics platform to identify multi-SNP disease signatures significantly associated with endometriosis in a white European UK Biobank (UKB) cohort. We assessed the reproducibility of these multi-SNP disease signatures as well as 35 of the 42 SNPs identified by a recent meta-GWAS study in a multi-ancestry American endometriosis cohort from All of Us (AoU) after controlling for population structure.
We identified 1,709 disease signatures, comprising 2,957 unique SNPs in combinations of 2-5 SNPs, that were associated with increased prevalence of endometriosis in UKB. We observed a significant enrichment of these signatures (58-88%, <0.04) that are also positively associated with endometriosis in the AoU cohort, including one 2-SNP signature that is individually significant. Reproducibility rates were greatest for higher frequency signatures, ranging from 80-88% for signatures with greater than 9% frequency (<0.01) in AoU. Encouragingly, the disease signatures also show high reproducibility rates in non-white European AoU sub-cohorts (66-76%, <0.04 for signatures with greater than 4% frequency).A total of 195 unique SNPs mapping to 100 genes were identified in the high frequency reproducing signatures (>9%). Of these, 4 genes were previously identified in the endometriosis meta-GWAS study and 19 genes have a previous association with endometriosis in OpenTargets. 77 novel genes were identified in this study.We characterized 9 novel genes that occur at the highest frequency in reproducing signatures and that do not contain any SNPs linked to known GWAS genes, providing new evidence for links between endometriosis and autophagy and macrophage biology. Reproducibility rates, ranging between 73% to 85%. are especially strong for the signatures that contain these 9 genes independently of any SNPs mapping to the meta-GWAS genes. These genes also include several targets novel to endometriosis with credible therapeutic discovery, repurposing and/or repositioning potential.
Although using much smaller, less well-characterized datasets than the previous whole genome meta-GWAS study, combinatorial analysis has provided important new insights into the genetics and biology of endometriosis. The finding of 77 novel gene associations that have high frequency and reproduce in an independent, ancestrally diverse dataset demonstrates that combinatorial analysis can identify biologically relevant genes that are overlooked by GWAS approaches. Several of these novel genes will are credible targets for drug discovery and repurposing, as shown by the examples highlighted.The broad reproducibility of results across datasets and ancestries suggests that combinatorial disease signatures can be used to identify different mechanistic etiologies that have the potential to inform precision medicine-based approaches and generate new clinical treatments for this complex disease.
子宫内膜异位症影响约10%的育龄女性。它常常对患者的生活质量产生严重负面影响,但确诊的平均时间仍为7至9年,且有效的治疗选择很少。尽管该疾病的遗传度相当高,但对其遗传驱动因素的了解相对较少。最近一项大型全基因组关联研究(GWAS)的荟萃分析确定了42个与子宫内膜异位症风险相关的基因组位点,但这些位点共同仅解释了5%的疾病变异。
我们使用PrecisionLife组合分析平台,在一个欧洲白人英国生物银行(UKB)队列中识别与子宫内膜异位症显著相关的多单核苷酸多态性(SNP)疾病特征。在控制了人群结构后,我们评估了这些多SNP疾病特征以及最近一项GWAS荟萃研究确定的42个SNP中的35个在一个来自“我们所有人(AoU)”的多血统美国子宫内膜异位症队列中的可重复性。
我们在UKB中识别出1709个疾病特征,由2至5个SNP组合中的2957个独特SNP组成,这些特征与子宫内膜异位症患病率增加相关。我们观察到这些特征在AoU队列中也与子宫内膜异位症呈正相关且显著富集(58 - 88%,<0.04),包括一个单独显著的双SNP特征。对于频率较高的特征,可重复性率最高,在AoU中频率大于9%的特征的可重复性率为80 - 88%(<0.01)。令人鼓舞的是,疾病特征在非欧洲白人的AoU亚队列中也显示出较高的可重复性率(66 - 76%,频率大于4%的特征<0.04)。在高频再现特征(>9%)中总共鉴定出195个映射到100个基因的独特SNP。其中,4个基因先前在子宫内膜异位症GWAS荟萃研究中被鉴定,19个基因先前在OpenTargets中与子宫内膜异位症相关。本研究鉴定出77个新基因。我们对9个在再现特征中出现频率最高且不包含任何与已知GWAS基因连锁的SNP的新基因进行了特征分析,为子宫内膜异位症与自噬和巨噬细胞生物学之间的联系提供了新证据。对于独立于映射到荟萃GWAS基因的任何SNP而包含这9个基因的特征,可重复性率在73%至85%之间,尤为显著。这些基因还包括几个子宫内膜异位症的新靶点,具有可靠的治疗发现、重新利用和/或重新定位潜力。
尽管使用的数据集比之前的全基因组GWAS荟萃研究小得多且特征描述不够充分,但组合分析为子宫内膜异位症的遗传学和生物学提供了重要的新见解。在一个独立的、祖先多样化的数据集中发现77个高频且可重复的新基因关联,表明组合分析可以识别被GWAS方法忽略的生物学相关基因。如突出显示的例子所示,这些新基因中的几个是药物发现和重新利用的可靠靶点。结果在不同数据集和祖先中的广泛可重复性表明,组合疾病特征可用于识别不同的机制病因,有可能为基于精准医学的方法提供信息,并为这种复杂疾病产生新的临床治疗方法。