Kourou Konstantina, Rigas George, Papaloukas Costas, Mitsis Michalis, Fotiadis Dimitrios I
Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, GR 45110, Greece; Dept. of Biological Applications and Technology, University of Ioannina, Ioannina, GR, 45110, Greece.
Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, GR 45110, Greece.
Comput Biol Med. 2020 Jan;116:103577. doi: 10.1016/j.compbiomed.2019.103577. Epub 2019 Dec 9.
Genomic profiling of cancer studies has generated comprehensive gene expression patterns for diverse phenotypes. Computational methods which employ transcriptomics datasets have been proposed to model gene expression data. Dynamic Bayesian Networks (DBNs) have been used for modeling time series datasets and for the inference of regulatory networks. Furthermore, cancer classification through DBN-based approaches could reveal the importance of exploiting knowledge from statistically significant genes and key regulatory molecules. Although microarray datasets have been employed extensively by several classification methods for decision making, the use of new knowledge from the pathway level has not been addressed adequately in the literature in terms of DBNs for cancer classification. In the present study, we identify the genes that act as regulators and mediate the activity of transcription factors that have been found in all promoters of our differentially expressed gene sets. These features serve as potential priors for distinguishing tumor from normal samples using a DBN-based classification approach. We employed three microarray datasets from the Gene Expression Omnibus (GEO) public functional repository and performed differential expression analysis. Promoter and pathway analysis of the identified genes revealed the key regulators which influence the transcription mechanisms of these genes. We applied the DBN algorithm on selected genes and identified the features that can accurately classify the samples into tumors and controls. Both accuracy and Area Under the Curve (AUC) were high for the gene sets comprising of the differentially expressed genes along with their master regulators (accuracy: 70.8%-98.5%; AUC: 0.562-0.985).
癌症研究的基因组分析已经生成了多种表型的全面基因表达模式。已经提出了利用转录组学数据集的计算方法来对基因表达数据进行建模。动态贝叶斯网络(DBN)已被用于对时间序列数据集进行建模以及推断调控网络。此外,通过基于DBN的方法进行癌症分类可以揭示利用来自具有统计学意义的基因和关键调控分子的知识的重要性。尽管微阵列数据集已被多种分类方法广泛用于决策,但在基于DBN的癌症分类文献中,尚未充分探讨从通路水平获取新知识的应用。在本研究中,我们确定了作为调节因子并介导在我们差异表达基因集的所有启动子中发现的转录因子活性的基因。这些特征作为使用基于DBN的分类方法区分肿瘤样本和正常样本的潜在先验信息。我们使用了来自基因表达综合数据库(GEO)公共功能库的三个微阵列数据集并进行了差异表达分析。对鉴定出的基因进行启动子和通路分析揭示了影响这些基因转录机制的关键调节因子。我们将DBN算法应用于选定的基因,并确定了能够将样本准确分类为肿瘤和对照的特征。对于由差异表达基因及其主要调节因子组成的基因集,准确率和曲线下面积(AUC)都很高(准确率:70.8%-98.5%;AUC:0.562-0.985)。