Smail Alice, Al-Jawahiri Reem, Baker Kate
MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK.
Department of Medical & Molecular Genetics, King's College London, London, UK.
Eur J Hum Genet. 2025 Jan 22. doi: 10.1038/s41431-025-01784-2.
Polycomb group (PcG) and Trithorax group (TrxG) complexes represent two major components of the epigenetic machinery. This study aimed to delineate phenotypic similarities and differences across developmental conditions arising from rare variants in PcG and TrxG genes, using data-driven approaches. 462 patients with a PcG or TrxG-associated condition were identified in the DECIPHER dataset. We analysed Human Phenotype Ontology (HPO) data to identify phenotypes enriched in this group, in comparison to other monogenic conditions within DECIPHER. We then assessed phenotypic relationships between single gene diagnoses within the PcG and TrxG group, by applying semantic similarity analysis and hierarchical clustering. Finally, we analysed patient-level phenotypic heterogeneity in this group, irrespective of specific genetic diagnosis, by applying the same clustering approach. Collectively, PcG/TrxG diagnoses were associated with increased reporting of HPO terms relating to integument, growth, head and neck, limb and digestive abnormalities. Gene group analysis identified three multi-gene clusters differentiated by microcephaly, limb/digit dysmorphologies, growth abnormalities and atypical behavioural phenotypes. Patient-level analysis identified two large clusters differentiated by neurodevelopmental abnormalities and facial dysmorphologies respectively, as well as smaller clusters associated with more specific phenotypes including behavioural characteristics, eye abnormalities, growth abnormalities and skull dysmorphologies. Importantly, patient-level phenotypic clusters did not align with genetic diagnoses. Data-driven approaches can highlight pathway-level and gene-level phenotypic convergences, and individual-level phenotypic heterogeneities. Future studies are needed to understand the multi-level mechanisms contributing to both convergence and variability within this population, and to extend data collection and analyses to later-emerging health characteristics.
多梳蛋白家族(PcG)和三胸蛋白家族(TrxG)复合物是表观遗传机制的两个主要组成部分。本研究旨在使用数据驱动的方法,描绘由PcG和TrxG基因中的罕见变异所导致的、在不同发育条件下的表型异同。在DECIPHER数据集中识别出462例患有PcG或TrxG相关疾病的患者。我们分析了人类表型本体论(HPO)数据,以确定与DECIPHER中的其他单基因疾病相比,该组中富集的表型。然后,我们通过应用语义相似性分析和层次聚类,评估了PcG和TrxG组内单基因诊断之间的表型关系。最后,我们通过应用相同的聚类方法,分析了该组患者水平的表型异质性,而不考虑具体的基因诊断。总体而言,PcG/TrxG诊断与有关皮肤、生长、头颈部、肢体和消化异常的HPO术语报告增加相关。基因分组分析确定了三个多基因簇,其区别在于小头畸形、肢体/手指畸形、生长异常和非典型行为表型。患者水平分析确定了两个大簇,分别以神经发育异常和面部畸形为特征,以及与更具体表型相关的较小簇,包括行为特征、眼部异常、生长异常和颅骨畸形。重要的是,患者水平的表型簇与基因诊断不一致。数据驱动的方法可以突出通路水平和基因水平的表型趋同,以及个体水平的表型异质性。未来需要开展研究,以了解导致该人群趋同和变异的多层次机制,并将数据收集和分析扩展到后期出现的健康特征。