School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.
School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.
J Biomed Inform. 2023 Sep;145:104479. doi: 10.1016/j.jbi.2023.104479. Epub 2023 Aug 25.
Biological networks are known to be highly modular, and the dysfunction of network modules may cause diseases. Defining the key modules from the omics data and establishing the classification model is helpful in promoting the research of disease diagnosis and prognosis. However, for applying modules in downstream analysis such as disease states discrimination, most methods only utilize the node information, and ignore the node interactions or topological information, which may lead to false positives and limit the model performance. In this study, we propose an omics data analysis method based on feature linear relationship and graph convolutional network (LCNet). In LCNet, we adopt a way of applying the difference of feature linear relationships during disease development to characterize physiological and pathological changes and construct the differential linear relation network, which is simple and interpretable from the perspective of feature linear relationship. A greedy strategy is developed for searching the highly interactive modules with a strong discrimination ability. To fully utilize the information of the detected modules, the personalized sub-graphs for each sample based on the modules are defined, and the graph convolutional network (GCN) classifiers are trained to predict the sample labels. The experimental results on public datasets show the superiority of LCNet in classification performance. For Breast Cancer metabolic data, the identified metabolites by LCNet involve important pathways. Thus, LCNet can identify the module biomarkers by feature linear relationship and a greedy strategy, and label samples by personalized sub-graphs and GCN. It provides a new manner of utilizing node (molecule) information and topological information in the defined modules for better disease classification.
生物网络已知具有高度的模块性,网络模块的功能障碍可能导致疾病。从组学数据中定义关键模块并建立分类模型有助于促进疾病诊断和预后的研究。然而,为了将模块应用于下游分析,如疾病状态的区分,大多数方法仅利用节点信息,而忽略节点相互作用或拓扑信息,这可能导致假阳性并限制模型性能。在本研究中,我们提出了一种基于特征线性关系和图卷积网络(LCNet)的组学数据分析方法。在 LCNet 中,我们采用了一种利用疾病发展过程中特征线性关系差异的方法来描述生理和病理变化,并构建了差异线性关系网络,从特征线性关系的角度来看,该方法简单且具有可解释性。开发了一种贪婪策略来搜索具有强区分能力的高度交互模块。为了充分利用检测到的模块的信息,根据模块定义了每个样本的个性化子图,并训练基于子图的图卷积网络(GCN)分类器来预测样本标签。在公共数据集上的实验结果表明了 LCNet 在分类性能方面的优越性。对于乳腺癌代谢数据,LCNet 鉴定出的代谢物涉及重要途径。因此,LCNet 可以通过特征线性关系和贪婪策略识别模块生物标志物,并通过个性化子图和 GCN 对样本进行标记。它为更好地进行疾病分类提供了一种利用定义模块中的节点(分子)信息和拓扑信息的新方式。