一种通过整合多组学数据进行人类必需基因预测的深度集成框架。

A deep ensemble framework for human essential gene prediction by integrating multi-omics data.

作者信息

Zhang Xue, Xiao Weijia, Cochran Brent, Xiao Wangxin

机构信息

College of Information Science and Engineering, Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang, 422000, China.

Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.

出版信息

Sci Rep. 2025 Jul 21;15(1):26407. doi: 10.1038/s41598-025-99164-9.

Abstract

Essential genes are necessary for the survival or reproduction of a living organism. The prediction and analysis of gene essentiality can advance our understanding of basic life and human diseases, and further boost the development of new drugs. We propose a snapshot ensemble deep neural network method, DeEPsnap, to predict human essential genes. DeEPsnap integrates the features derived from DNA and protein sequence data with the features extracted or learned from four types of functional data: gene ontology, protein complex, protein domain, and protein-protein interaction networks. More than 200 features from these biological data are extracted/learned which are integrated together to train a series of cost-sensitive deep neural networks. The proposed snapshot mechanism enables us to train multiple models without increasing extra training effort and cost. The experimental results of 10-fold cross-validation show that DeEPsnap can accurately predict human gene essentiality with an average AUROC of 96.16%, AUPRC of 93.83%, and accuracy of 92.36%. The comparative experiments show that DeEPsnap outperforms several popular traditional machine learning models and deep learning models, while all those models show promising performance using the features we created for DeEPsnap. We demonstrated that the proposed method, DeEPsnap, is effective for predicting human essential genes.

摘要

必需基因对于生物体的生存或繁殖至关重要。基因必需性的预测和分析可以增进我们对基本生命和人类疾病的理解,并进一步推动新药的开发。我们提出了一种快照集成深度神经网络方法DeEPsnap来预测人类必需基因。DeEPsnap将从DNA和蛋白质序列数据中衍生的特征与从四种功能数据中提取或学习到的特征进行整合:基因本体、蛋白质复合物、蛋白质结构域和蛋白质-蛋白质相互作用网络。从这些生物学数据中提取/学习了200多个特征,并将它们整合在一起以训练一系列成本敏感型深度神经网络。所提出的快照机制使我们能够在不增加额外训练工作量和成本的情况下训练多个模型。10折交叉验证的实验结果表明,DeEPsnap能够准确预测人类基因必需性,平均AUROC为96.16%,AUPRC为93.83%,准确率为92.36%。对比实验表明,DeEPsnap优于几种流行的传统机器学习模型和深度学习模型,而所有这些模型在使用我们为DeEPsnap创建的特征时都表现出了良好的性能。我们证明了所提出的方法DeEPsnap对于预测人类必需基因是有效的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5883/12280155/f7ec046092b4/41598_2025_99164_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索