Uysal İlhan
Burdur Mehmet Akif Ersoy University, Bucak Zeliha Tolunay School of Applied Technology and Business Administration, Department of Information Systems and Technologies, Burdur, Türkiye.
Comput Biol Med. 2025 Sep;196(Pt B):110862. doi: 10.1016/j.compbiomed.2025.110862. Epub 2025 Aug 2.
Pheochromocytoma (PCC) is a rare neuroendocrine tumor driven by complex molecular mechanisms, notably involving the oncogenic c-Myc/Max and c-Myc/c-Max protein complexes. Despite their pivotal role in tumor progression, the molecular interactions and bioactive compounds specifically targeting these complexes remain inadequately characterized. This study presents an integrative computational pipeline combining interpretable bioinformatics, network biology, and machine learning to elucidate key molecular mechanisms and bioactive motifs associated with PCC. A curated dataset of 5000 bioactive molecules was obtained from ChEMBL, and structural motifs associated with bioactivity were identified using a genetic programming-based approach. Random Forest, Support Vector Machines, and Gradient Boosting classifiers were trained and cross-validated using 10-fold cross-validation to predict pIC50 values, achieving high performance (mean accuracy: 0.98, AUC >0.97). Feature importance analysis consistently identified pIC50, molecular weight (MW), lipophilicity (LogP), and hydrogen-bonding properties as primary determinants of bioactivity. PPI networks were built using STRING's experimentally validated interactions and refined using BioGRID and literature cross-validation. Network centrality analysis and community detection using the Girvan-Newman algorithm revealed MYC, MAX, and EP300 as central hubs, with associated protein modules significantly enriched for biological processes including transcriptional regulation, cell cycle control, ubiquitination, and apoptosis. To enhance model interpretability, explainable artificial intelligence (XAI) methods, including SHAP and DALEX, were employed to elucidate the contribution of individual molecular descriptors, mechanistically elucidating compound-target interactions. Despite its robustness, this computational framework lacks experimental validation and independent external datasets. Additionally, STRING's uniform confidence scores limited edge-weight precision in network visualizations during network analyses. Nevertheless, this study demonstrates the potential of a multi-layered computational approach to deepen the understanding of MYC-driven oncogenesis in PCC. By integrating motif discovery, network biology, and interpretable machine learning, the work identifies actionable molecular signatures and critical protein targets, providing a foundation for future experimental validation and the development of targeted therapies in pheochromocytoma as well as other rare cancers.
嗜铬细胞瘤(PCC)是一种由复杂分子机制驱动的罕见神经内分泌肿瘤,尤其涉及致癌性c-Myc/Max和c-Myc/c-Max蛋白复合物。尽管它们在肿瘤进展中起关键作用,但专门针对这些复合物的分子相互作用和生物活性化合物仍未得到充分表征。本研究提出了一种综合计算流程,结合可解释的生物信息学、网络生物学和机器学习,以阐明与PCC相关的关键分子机制和生物活性基序。从ChEMBL获得了一个经过整理的包含5000种生物活性分子的数据集,并使用基于遗传编程的方法鉴定了与生物活性相关的结构基序。使用随机森林、支持向量机和梯度提升分类器,并通过10折交叉验证进行训练和交叉验证,以预测pIC50值,取得了高性能(平均准确率:0.98,AUC>0.97)。特征重要性分析一致确定pIC50、分子量(MW)、亲脂性(LogP)和氢键性质是生物活性的主要决定因素。使用STRING的经实验验证的相互作用构建蛋白质-蛋白质相互作用(PPI)网络,并使用BioGRID和文献交叉验证进行优化。使用Girvan-Newman算法进行网络中心性分析和社区检测,结果显示MYC、MAX和EP300是中心枢纽,与之相关的蛋白质模块在包括转录调控、细胞周期控制、泛素化和凋亡在内的生物学过程中显著富集。为了提高模型的可解释性,采用了包括SHAP和DALEX在内的可解释人工智能(XAI)方法来阐明各个分子描述符的贡献,从机制上阐明化合物-靶点相互作用。尽管该计算框架具有稳健性,但缺乏实验验证和独立的外部数据集。此外,STRING的统一置信度分数在网络分析期间限制了网络可视化中边权重的精度。然而,本研究证明了一种多层计算方法在加深对PCC中MYC驱动的肿瘤发生理解方面的潜力。通过整合基序发现、网络生物学和可解释的机器学习,该研究确定了可操作的分子特征和关键蛋白质靶点,为未来嗜铬细胞瘤以及其他罕见癌症的实验验证和靶向治疗的开发奠定了基础。