Peach Robert L, Arnaudon Alexis, Schmidt Julia A, Palasciano Henry A, Bernier Nathan R, Jelfs Kim E, Yaliraki Sophia N, Barahona Mauricio
Department of Mathematics, Imperial College London, SW7 2AZ London, UK.
Blue Brain Project, École polytechnique fédérale de Lausanne (EPFL), Campus Biotech, 1202 Geneva, Switzerland.
Patterns (N Y). 2021 Apr 2;2(4):100227. doi: 10.1016/j.patter.2021.100227. eCollection 2021 Apr 9.
Networks are widely used as mathematical models of complex systems across many scientific disciplines. Decades of work have produced a vast corpus of research characterizing the topological, combinatorial, statistical, and spectral properties of graphs. Each graph property can be thought of as a feature that captures important (and sometimes overlapping) characteristics of a network. In this paper, we introduce HCGA, a framework for highly comparative analysis of graph datasets that computes several thousands of graph features from any given network. HCGA also offers a suite of statistical learning and data analysis tools for automated identification and selection of important and interpretable features underpinning the characterization of graph datasets. We show that HCGA outperforms other methodologies on supervised classification tasks on benchmark datasets while retaining the interpretability of network features. We exemplify HCGA by predicting the charge transfer in organic semiconductors and clustering a dataset of neuronal morphology images.
网络作为一种数学模型,在许多科学学科的复杂系统中得到了广泛应用。数十年来的研究成果形成了大量关于图的拓扑、组合、统计和谱性质的研究文献。每个图属性都可以被视为一种特征,它捕捉了网络的重要(有时是重叠的)特征。在本文中,我们介绍了HCGA,这是一个用于对图数据集进行高度比较分析的框架,它可以从任何给定网络中计算出数千个图特征。HCGA还提供了一套统计学习和数据分析工具,用于自动识别和选择支撑图数据集特征描述的重要且可解释的特征。我们表明,在基准数据集上的监督分类任务中,HCGA优于其他方法,同时保留了网络特征的可解释性。我们通过预测有机半导体中的电荷转移和对神经元形态图像数据集进行聚类来举例说明HCGA。