Turkish Medicines and Medical Devices Agency, Ankara, Turkey.
Department of Medical Biology, School of Medicine, SANKO University, Gaziantep, Turkey.
Med Biol Eng Comput. 2022 Oct;60(10):2877-2897. doi: 10.1007/s11517-022-02641-w. Epub 2022 Aug 10.
Numerous studies have been conducted to elucidate the relation of tumor proximity to cancer prognosis and treatment efficacy in colorectal cancer. However, the molecular pathways and prognoses of left- and right-sided colorectal cancers are different, and this difference has not been fully investigated at the genomic level. In this study, a set of data science approaches, including six feature selection methods and three classification models, were used in predicting tumor location from gene expression profiles. Specificity, sensitivity, accuracy, and Mathew's correlation coefficient (MCC) evaluation metrics were used to evaluate the classification ability. Gene ontology enrichment analysis was applied by the Gene Ontology PANTHER Classification System. For the most significant 50 genes, protein-protein interactions and drug-gene interactions were analyzed using the GeneMANIA, CytoScape, CytoHubba, MCODE, and DGIdb databases. The highest classification accuracy (90%) is achieved with the most significant 200 genes when the ensemble-decision tree classification model is used with the ReliefF feature selection method. Molecular pathways and drug interactions are investigated for the most significant 50 genes. It is concluded that a machine-learning-based approach could be useful to discover the significant genes that may have an important role in the development of new therapies and drugs for colorectal cancer.
已经有许多研究旨在阐明肿瘤与结直肠癌预后和治疗效果的关系。然而,左、右侧结直肠癌的分子途径和预后不同,这一差异在基因组水平上尚未得到充分研究。在这项研究中,我们使用了一组数据科学方法,包括六种特征选择方法和三种分类模型,从基因表达谱中预测肿瘤位置。特异性、敏感性、准确性和马修相关系数(MCC)评价指标用于评估分类能力。通过基因本体 PANTHER 分类系统进行基因本体富集分析。对于最重要的 50 个基因,使用 GeneMANIA、Cytoscape、CytoHubba、MCODE 和 DGIdb 数据库进行蛋白质-蛋白质相互作用和药物-基因相互作用分析。当使用集成决策树分类模型和 ReliefF 特征选择方法时,使用最重要的 200 个基因可实现最高的分类准确性(90%)。对最重要的 50 个基因进行分子途径和药物相互作用研究。研究得出结论,基于机器学习的方法可能有助于发现可能对开发新的结直肠癌治疗方法和药物有重要作用的重要基因。