Suppr超能文献

基于序列和功能特征,使用机器学习方法进行必需基因预测。

Essential gene prediction in using machine learning approaches based on sequence and functional features.

作者信息

Aromolaran Olufemi, Beder Thomas, Oswald Marcus, Oyelade Jelili, Adebiyi Ezekiel, Koenig Rainer

机构信息

Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria.

Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany.

出版信息

Comput Struct Biotechnol J. 2020 Mar 10;18:612-621. doi: 10.1016/j.csbj.2020.02.022. eCollection 2020.

Abstract

Genes are termed to be essential if their loss of function compromises viability or results in profound loss of fitness. On the genome scale, these genes can be determined experimentally employing RNAi or knockout screens, but this is very resource intensive. Computational methods for essential gene prediction can overcome this drawback, particularly when intrinsic (e.g. from the protein sequence) as well as extrinsic features (e.g. from transcription profiles) are considered. In this work, we employed machine learning to predict essential genes in . A total of 27,340 features were generated based on a large variety of different aspects comprising nucleotide and protein sequences, gene networks, protein-protein interactions, evolutionary conservation and functional annotations. Employing cross-validation, we obtained an excellent prediction performance. The best model achieved in . a ROC-AUC of 0.90, a PR-AUC of 0.30 and a F1 score of 0.34. Our approach considerably outperformed a benchmark method in which only features derived from the protein sequences were used (P < 0.001). Investigating which features contributed to this success, we found all categories of features, most prominently network topological, functional and sequence-based features. To evaluate our approach we performed the same workflow for essential gene prediction in human and achieved an ROC-AUC = 0.97, PR-AUC = 0.73, and F1 = 0.64. In summary, this study shows that using our well-elaborated assembly of features covering a broad range of intrinsic and extrinsic gene and protein features enabled intelligent systems to predict well the essentiality of genes in an organism.

摘要

如果基因功能丧失会损害生存能力或导致适应性严重丧失,那么这些基因就被称为必需基因。在基因组规模上,可以通过实验使用RNA干扰或基因敲除筛选来确定这些基因,但这需要大量资源。用于预测必需基因的计算方法可以克服这一缺点,特别是当考虑内在特征(例如来自蛋白质序列)以及外在特征(例如来自转录谱)时。在这项工作中,我们使用机器学习来预测[具体物种]中的必需基因。基于包括核苷酸和蛋白质序列、基因网络、蛋白质-蛋白质相互作用、进化保守性和功能注释等各种不同方面,共生成了27340个特征。通过交叉验证,我们获得了出色的预测性能。最佳模型在[具体物种]中实现了受试者工作特征曲线下面积(ROC-AUC)为0.90、精确率-召回率曲线下面积(PR-AUC)为0.30以及F1分数为0.34。我们的方法显著优于仅使用源自蛋白质序列的特征的基准方法(P < 0.001)。在研究哪些特征促成了这一成功时,我们发现所有类别的特征都有贡献,最突出的是网络拓扑、功能和基于序列的特征。为了评估我们的方法,我们在人类中进行了相同的必需基因预测工作流程,获得了ROC-AUC = 0.97、PR-AUC = 0.73和F1 = 0.64。总之,这项研究表明,使用我们精心构建的涵盖广泛内在和外在基因及蛋白质特征的特征集合,使智能系统能够很好地预测生物体中基因的必需性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/c06ccf4c3e5a/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验