Naseri Atefeh, Sharghi Mehran, Hasheminejad Seyed Mohammad Hossein
Department of Computer Engineering, Alzahra University, Tehran, Iran.
Department of Computer Engineering, Alzahra University, Tehran, Iran.
Comput Biol Chem. 2021 Dec;95:107589. doi: 10.1016/j.compbiolchem.2021.107589. Epub 2021 Oct 6.
One of the main research topics in computational biology is Gene Regulatory Network (GRN) reconstruction that refers to inferring the relationships between genes involved in regulating cell conditions in response to internal or external stimuli. To this end, most computational methods use only transcriptional gene expression data to reconstruct gene regulatory networks, but recent studies suggest that gene expression data must be integrated with other types of data to obtain more accurate models predicting real relationships between genes. In this study, a diffusion-based method is enhanced to integrate biological data of network types besides structural prior knowledge. The Random Walk with Restart algorithm (RWR) with an emphasis on hub nodes is executed separately on each network, and then jointly optimizes low-dimensional feature vectors for network nodes by diffusion component analysis. Next, these feature vectors are used to infer gene regulatory networks. Fourteen centrality measures are studied for the detection of hub nodes to be used in the RWR algorithm, and the best centrality measure having the greatest effect on the improvement of gene network inference is selected. A case study for the Saccharomyces cerevisiae and E. coli networks shows that using the proposed features in comparison with gene expression data alone results in 0.02-0.08 units improvement in Area Under Receiver Characteristic Operator (AUROC) criteria across different gene regulatory network inference methods. Furthermore, the proposed method was applied to the esophageal cancer data to infer its gene regulatory network. The proposed framework substantially improves accuracy and scalability of GRN inference. The fused features and the best centrality measure detected can be used to provide functional insights about genes or proteins in various biological applications. Moreover, it can be served as a general framework for network data and structural data integration and analysis problems in various scientific disciplines including biology.
计算生物学的主要研究课题之一是基因调控网络(GRN)重建,它指的是推断参与响应内部或外部刺激来调节细胞状态的基因之间的关系。为此,大多数计算方法仅使用转录基因表达数据来重建基因调控网络,但最近的研究表明,基因表达数据必须与其他类型的数据相结合,以获得更准确的模型来预测基因之间的真实关系。在本研究中,一种基于扩散的方法得到改进,以整合除结构先验知识之外的网络类型的生物学数据。对每个网络分别执行强调枢纽节点的随机游走重启算法(RWR),然后通过扩散成分分析联合优化网络节点的低维特征向量。接下来,这些特征向量用于推断基因调控网络。研究了十四种中心性度量用于检测RWR算法中要使用的枢纽节点,并选择了对基因网络推断改进影响最大的最佳中心性度量。对酿酒酵母和大肠杆菌网络的案例研究表明,与仅使用基因表达数据相比,使用所提出的特征在不同基因调控网络推断方法的受试者操作特征曲线下面积(AUROC)标准方面可提高0.02 - 0.08个单位。此外,将所提出的方法应用于食管癌数据以推断其基因调控网络。所提出的框架显著提高了GRN推断的准确性和可扩展性。检测到的融合特征和最佳中心性度量可用于在各种生物学应用中提供有关基因或蛋白质的功能见解。此外,它可以作为包括生物学在内的各种科学学科中网络数据和结构数据整合与分析问题的通用框架。