Triantafyllidis Charalampos P, Aguas Ricardo
Nuffield Department of Medicine, University of Oxford, Oxford, UK.
NPJ Syst Biol Appl. 2025 Aug 12;11(1):92. doi: 10.1038/s41540-025-00567-1.
We employ a computational framework that integrates mathematical programming and Graph Neural Networks (GNNs) to elucidate functional phenotypic heterogeneity in disease by classifying entire pathways under various conditions of interest. Our approach combines two distinct, yet seamlessly integrated, modeling schemes. First, we leverage Prior Knowledge Networks (PKNs) to reconstruct gene networks from genomic and transcriptomic data. We demonstrate how this can be achieved through mathematical programming optimization and provide examples using comprehensive, established databases. We then tailor GNNs to classify each network as a single data point at graph-level, using various node embeddings and edge attributes. These networks may vary in their biological or molecular annotations, which serve as a labeling scheme for their supervised classification. We apply the framework to the human DNA damage and repair pathway using the TP53 regulon in a pancancer study across cell lines and tumor samples to classify Gene Regulatory Networks (GRNs) across different TP53 mutation types. This approach allows us to identify mutations with distinguishable functional profiles that can be related to specific phenotypes, thus providing a data-driven pipeline for genotype-to-phenotype translation. This scalable approach enables the classification of diverse conditions within the multi-factorial nature of diseases and disentangles their polygenic complexity by revealing new functional patterns through a causal representation.
我们采用了一个将数学规划和图神经网络(GNN)集成在一起的计算框架,通过对感兴趣的各种条件下的整个通路进行分类,来阐明疾病中的功能表型异质性。我们的方法结合了两种不同但无缝集成的建模方案。首先,我们利用先验知识网络(PKN)从基因组和转录组数据重建基因网络。我们展示了如何通过数学规划优化来实现这一点,并使用全面、成熟的数据库提供了示例。然后,我们定制GNN,在图级别将每个网络分类为单个数据点,使用各种节点嵌入和边属性。这些网络在其生物学或分子注释方面可能有所不同,这些注释作为其监督分类的标记方案。我们在一项跨细胞系和肿瘤样本的泛癌研究中,使用TP53调控子将该框架应用于人类DNA损伤和修复通路,以对不同TP53突变类型的基因调控网络(GRN)进行分类。这种方法使我们能够识别具有可区分功能特征且可与特定表型相关的突变,从而提供了一个从基因型到表型转化的数据驱动管道。这种可扩展的方法能够对疾病多因素性质中的各种条件进行分类,并通过因果表示揭示新的功能模式,从而解开其多基因复杂性。