Faculty of Computer Sciences and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat 83000, Malaysia.
Department of Electrical Engineering, Universitas Ahmad Dahlan, Yogyakarta 55166, Indonesia.
Genes (Basel). 2023 Feb 24;14(3):574. doi: 10.3390/genes14030574.
The integration of microarray technologies and machine learning methods has become popular in predicting the pathological condition of diseases and discovering risk genes. Traditional microarray analysis considers pathways as a simple gene set, treating all genes in the pathway identically while ignoring the pathway network's structure information. This study proposed an entropy-based directed random walk (e-DRW) method to infer pathway activities. Two enhancements from the conventional DRW were conducted, which are (1) to increase the coverage of human pathway information by constructing two inputting networks for pathway activity inference, and (2) to enhance the gene-weighting method in DRW by incorporating correlation coefficient values and -test statistic scores. To test the objectives, gene expression datasets were used as input datasets while the pathway datasets were used as reference datasets to build two directed graphs. The within-dataset experiments indicated that e-DRW method demonstrated robust and superior performance in terms of classification accuracy and robustness of the predicted risk-active pathways compared to the other methods. In conclusion, the results revealed that e-DRW not only improved the prediction performance, but also effectively extracted topologically important pathways and genes that were specifically related to the corresponding cancer types.
微阵列技术与机器学习方法的整合在预测疾病的病理状况和发现风险基因方面变得越来越流行。传统的微阵列分析将途径视为简单的基因集,对途径中的所有基因一视同仁,而忽略了途径网络的结构信息。本研究提出了一种基于熵的有向随机游走(e-DRW)方法来推断途径活性。对传统的 DRW 进行了两项增强,分别是(1)通过构建两个用于途径活性推断的输入网络,增加人类途径信息的覆盖范围,以及(2)通过结合相关系数值和 t 检验统计分数,增强 DRW 中的基因加权方法。为了测试这些目标,使用基因表达数据集作为输入数据集,而途径数据集作为参考数据集来构建两个有向图。在数据集内的实验表明,与其他方法相比,e-DRW 方法在分类准确性和预测风险活跃途径的稳健性方面表现出了强大而优越的性能。总之,结果表明,e-DRW 不仅提高了预测性能,而且还有效地提取了与相应癌症类型特别相关的拓扑上重要的途径和基因。