Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, India.
Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, India.
Comput Biol Chem. 2019 Feb;78:367-374. doi: 10.1016/j.compbiolchem.2018.12.022. Epub 2018 Dec 31.
Mining patterns of co-expressed genes across the subset of conditions help to narrow down the search space for the analysis of gene expression data. Identifying conditions specific key genes from the large-scale gene expression data is a challenging task. The conditions specific key gene signifies functional behavior of a group of co-expressed genes across the subset of conditions and can be act as biomarkers of the diseases. In this paper, we have propose a novel approach for identification of conditions specific key genes from Basal-Like Breast Cancer (BLBC) disease using biclustering algorithm and Gene Co-expression Network (GCN). The proposed approach is a two-stage approach. In the first stage, significant biclusters have been extracted with the help of 'runibic' biclustering algorithm. The second stage identifies conditions specific key genes from the extracted significant biclusters with the help of GCN. By using difference matrix and gene correlation matrix, we have constructed biologically meaningful and statistically strong GCN. Also, presented the proposed approach with the help of a process diagram and demonstrated the procedure with an example of bicluster number 93 (Bic93). From the experimental results, we observed that 95% and 85% of the extracted biclusters are found to be biologically significant at the p-values less than 0.05 and 0.01 respectively. We have compared proposed approach with the Weighted Gene Co-expression Network Analysis (WGCNA) based approach. From the comparison, our approach has performed effectively and extracted biologically significant biclusters. Also, identified conditions specific key genes which cannot be extracted using the WGCNA based approach. Some of the important identified known key genes are PIK3CA, SHC3, ERBB2, SHC4, PTOV1, STAG1, ZNF215 etc. These key genes can be used as a diagnostic and prognostic biomarker for the BLBC disease after the rigorous analysis. The identified conditions specific key genes can be helpful to reduce the analysis time and increase the accuracy of further research such as biomarker identification, drug target discovery etc.
跨条件子集共表达基因模式的挖掘有助于缩小基因表达数据分析的搜索空间。从大规模基因表达数据中识别特定条件的关键基因是一项具有挑战性的任务。特定条件的关键基因表示一组共表达基因在条件子集上的功能行为,可作为疾病的生物标志物。在本文中,我们提出了一种使用双聚类算法和基因共表达网络(GCN)从基底样乳腺癌(BLBC)疾病中识别特定条件关键基因的新方法。该方法是一种两阶段方法。在第一阶段,使用“runibic”双聚类算法提取显著双聚类。在第二阶段,使用 GCN 从提取的显著双聚类中识别特定条件的关键基因。通过使用差异矩阵和基因相关矩阵,我们构建了具有生物学意义和统计学意义的强 GCN。还通过过程图介绍了该方法,并通过双聚类编号 93(Bic93)的示例演示了该过程。从实验结果中,我们观察到,在 p 值小于 0.05 和 0.01 时,分别有 95%和 85%的提取双聚类被认为具有生物学意义。我们将该方法与基于加权基因共表达网络分析(WGCNA)的方法进行了比较。从比较中,我们的方法表现出了有效性,并提取了具有生物学意义的双聚类。还确定了无法使用基于 WGCNA 的方法提取的特定条件的关键基因。一些重要的已识别关键基因包括 PIK3CA、SHC3、ERBB2、SHC4、PTOV1、STAG1、ZNF215 等。这些关键基因可以在经过严格分析后,作为 BLBC 疾病的诊断和预后生物标志物。识别特定条件的关键基因有助于减少分析时间,并提高进一步研究(如生物标志物识别、药物靶点发现等)的准确性。