INSA Lyon, INRAE, BF2I, UMR 203, Université de Lyon, 69100 Villeurbanne, France.
INRAE, INSA Lyon, BF2I, UMR 203, Université de Lyon, 69100 Villeurbanne, France.
Biomolecules. 2023 Mar 13;13(3):526. doi: 10.3390/biom13030526.
Self-expressiveness is a mathematical property that aims at characterizing the relationship between instances in a dataset. This property has been applied widely and successfully in computer-vision tasks, time-series analysis, and to infer underlying network structures in domains including protein signaling interactions and social-networks activity. Nevertheless, despite its potential, self-expressiveness has not been explicitly used to infer gene networks. In this article, we present Generalizable Gene Self-Expressive Networks, a new, interpretable, and generalization-aware formalism to model gene networks, and we propose two methods: GXN•EN and GXN•OMP, based respectively on ElasticNet and OMP (Orthogonal Matching Pursuit), to infer and assess Generalizable Gene Self-Expressive Networks. We evaluate these methods on four Microarray datasets from the DREAM5 benchmark, using both internal and external metrics. The results obtained by both methods are comparable to those obtained by state-of-the-art tools, but are fast to train and exhibit high levels of sparsity, which make them easier to interpret. Moreover we applied these methods to three complex datasets containing RNA-seq informations from different mammalian tissues/cell-types. Lastly, we applied our methodology to compare a normal vs. a disease condition (Alzheimer), which allowed us to detect differential expression of genes' sub-networks between these two biological conditions. Globally, the gene networks obtained exhibit a sparse and modular structure, with inner communities of genes presenting statistically significant over/under-expression on specific cell types, as well as significant enrichment for some anatomical GO terms, suggesting that such communities may also drive important functional roles.
自表达性是一种数学特性,旨在刻画数据集中实例之间的关系。该特性已广泛且成功地应用于计算机视觉任务、时间序列分析以及蛋白质信号转导相互作用和社交网络活动等领域中底层网络结构的推断。然而,尽管具有潜力,但自表达性尚未被明确用于推断基因网络。在本文中,我们提出了可推广基因自表达网络(Generalizable Gene Self-Expressive Networks,GGSEN),这是一种新的、可解释且具有泛化意识的基因网络建模形式化方法,并提出了两种方法:GXN•EN 和 GXN•OMP,分别基于弹性网络(ElasticNet)和正交匹配追踪(Orthogonal Matching Pursuit,OMP),用于推断和评估可推广基因自表达网络。我们使用内部和外部指标,在 DREAM5 基准的四个微阵列数据集上评估了这些方法。这两种方法得到的结果与最先进工具得到的结果相当,但训练速度快,且稀疏度高,这使得它们更容易解释。此外,我们还将这些方法应用于包含来自不同哺乳动物组织/细胞类型的 RNA-seq 信息的三个复杂数据集。最后,我们将我们的方法应用于比较正常与疾病状况(阿尔茨海默病),这使我们能够检测这两种生物学状况之间基因子网络的差异表达。总体而言,所获得的基因网络表现出稀疏和模块化的结构,内部基因社区在特定细胞类型上呈现出显著的过表达/低表达,并且对某些解剖学 GO 术语具有显著的富集,这表明这些社区也可能发挥重要的功能作用。