Tevosyan Ani, Yeghiazaryan Hrach, Tadevosyan Gohar, Apresyan Lilit, Atoyan Vahe, Misakyan Anna, Navoyan Zaven, Stopper Helga, Babayan Nelly, Khondkaryan Lusine
Institute of Molecular Biology, NAS RA, Hasratyan 7, Yerevan 0014, Armenia; Toxometris.ai, Sarmen str.7, Yerevan 0019, Armenia.
Toxometris.ai, Sarmen str.7, Yerevan 0019, Armenia.
Mutat Res Genet Toxicol Environ Mutagen. 2025 Apr;903:503858. doi: 10.1016/j.mrgentox.2025.503858. Epub 2025 Feb 26.
This study aimed to develop an in silico model for predicting human carcinogenicity using advanced deep learning techniques, specifically Graph Neural Networks (GNN), through a multitask learning (MTL) approach. The MTL framework leveraged auxiliary tasks, including mutagenicity, genotoxicity, animal carcinogenicity, androgen and estrogen receptor binding, to enhance the model's predictive capabilities for the primary task of human carcinogenicity. Three distinct GNN architectures were used alongside various combinations of auxiliary tasks to evaluate the variations in performance metrics. Results demonstrated that multitask learning significantly enhances the predictive performance of GNN models compared to single-task learning for predicting human carcinogenicity. The best performed MTL model achieved an area under the curve of 0.89, along with a balanced accuracy of 82 %, and sensitivity and specificity values of 0.75 and 0.89, respectively. The developed multitask learning (MTL) models function on tasks that represent assays for identifying both genotoxic and non-genotoxic carcinogens, thereby enhancing the model's capability to predict human carcinogenic risk with greater accuracy. The advanced GNN models demonstrated effectiveness in addressing data imbalance issues frequently observed in biological datasets, mitigating the bias that typically favors one class over another. Overall, these results underscore the promise of GNN-based MTL models for reliable chemical screening and prioritization, particularly in predicting human carcinogenicity.
本研究旨在通过多任务学习(MTL)方法,利用先进的深度学习技术,特别是图神经网络(GNN),开发一种用于预测人类致癌性的计算机模型。MTL框架利用了包括致突变性、基因毒性、动物致癌性、雄激素和雌激素受体结合等辅助任务,以增强模型对人类致癌性这一主要任务的预测能力。使用了三种不同的GNN架构以及辅助任务的各种组合来评估性能指标的变化。结果表明,与单任务学习相比,多任务学习显著提高了GNN模型预测人类致癌性的性能。表现最佳的MTL模型的曲线下面积为0.89,平衡准确率为82%,灵敏度和特异性值分别为0.75和0.89。所开发的多任务学习(MTL)模型作用于代表识别基因毒性和非基因毒性致癌物检测的任务,从而提高了模型更准确预测人类致癌风险的能力。先进的GNN模型在解决生物数据集中经常出现的数据不平衡问题方面表现出有效性,减轻了通常偏向一类而非另一类的偏差。总体而言,这些结果强调了基于GNN的MTL模型在可靠的化学筛选和优先级排序方面的前景,特别是在预测人类致癌性方面。