DeepCellEss：基于注意力机制的可解释深度学习的细胞系特异性必需蛋白预测。

DeepCellEss: cell line-specific essential protein prediction with attention-based interpretable deep learning.

机构信息

Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China.

Division of Biomedical Engineering, Department of Computer Science, Department of Mechanical Engineering University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada.

出版信息

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac779.

DOI:10.1093/bioinformatics/btac779

PMID:36458923

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9825760/

Abstract

MOTIVATION

Protein essentiality is usually accepted to be a conditional trait and strongly affected by cellular environments. However, existing computational methods often do not take such characteristics into account, preferring to incorporate all available data and train a general model for all cell lines. In addition, the lack of model interpretability limits further exploration and analysis of essential protein predictions.

RESULTS

In this study, we proposed DeepCellEss, a sequence-based interpretable deep learning framework for cell line-specific essential protein predictions. DeepCellEss utilizes a convolutional neural network and bidirectional long short-term memory to learn short- and long-range latent information from protein sequences. Further, a multi-head self-attention mechanism is used to provide residue-level model interpretability. For model construction, we collected extremely large-scale benchmark datasets across 323 cell lines. Extensive computational experiments demonstrate that DeepCellEss yields effective prediction performance for different cell lines and outperforms existing sequence-based methods as well as network-based centrality measures. Finally, we conducted some case studies to illustrate the necessity of considering specific cell lines and the superiority of DeepCellEss. We believe that DeepCellEss can serve as a useful tool for predicting essential proteins across different cell lines.

AVAILABILITY AND IMPLEMENTATION

The DeepCellEss web server is available at http://csuligroup.com:8000/DeepCellEss. The source code and data underlying this study can be obtained from https://github.com/CSUBioGroup/DeepCellEss.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质的必需性通常被认为是一种有条件的特征，并且受到细胞环境的强烈影响。然而，现有的计算方法往往没有考虑到这些特征，而是倾向于整合所有可用的数据，并为所有细胞系训练一个通用模型。此外，缺乏模型可解释性限制了对必需蛋白质预测的进一步探索和分析。

结果

在这项研究中，我们提出了 DeepCellEss，这是一个基于序列的可解释深度学习框架，用于细胞系特异性必需蛋白质预测。DeepCellEss 利用卷积神经网络和双向长短期记忆从蛋白质序列中学习短程和长程潜在信息。此外，使用多头自注意力机制提供残基级别的模型可解释性。为了构建模型，我们收集了跨越 323 个细胞系的超大规模基准数据集。广泛的计算实验表明，DeepCellEss 为不同的细胞系提供了有效的预测性能，并优于现有的基于序列的方法和基于网络的中心性度量。最后，我们进行了一些案例研究来说明考虑特定细胞系的必要性和 DeepCellEss 的优越性。我们相信 DeepCellEss 可以作为预测不同细胞系中必需蛋白质的有用工具。

可用性和实现

DeepCellEss 的网络服务器可在 http://csuligroup.com:8000/DeepCellEss 访问。本研究的源代码和基础数据可从 https://github.com/CSUBioGroup/DeepCellEss 获得。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a29/9825760/efc8dac7a46a/btac779f1.jpg

相似文献

DeepCellEss: cell line-specific essential protein prediction with attention-based interpretable deep learning.DeepCellEss：基于注意力机制的可解释深度学习的细胞系特异性必需蛋白预测。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac779.

LncLocFormer: a Transformer-based deep learning model for multi-label lncRNA subcellular localization prediction by using localization-specific attention mechanism.LncLocFormer：一种基于 Transformer 的深度学习模型，通过使用定位特异性注意力机制，对多标签 lncRNA 亚细胞定位进行预测。

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad752.

ifDEEPre: large protein language-based deep learning enables interpretable and fast predictions of enzyme commission numbers.ifDEEPre：基于大型蛋白质语言的深度学习可实现酶委员会编号的可解释和快速预测。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae225.

Combination of deep neural network with attention mechanism enhances the explainability of protein contact prediction.深度神经网络与注意力机制的结合增强了蛋白质接触预测的可解释性。

Proteins. 2021 Jun;89(6):697-707. doi: 10.1002/prot.26052. Epub 2021 Feb 16.

Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding.基于 k- -mer 嵌入卷积长短期记忆网络的染色质可及性预测。

Bioinformatics. 2017 Jul 15;33(14):i92-i101. doi: 10.1093/bioinformatics/btx234.

SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues.SOFB 是一种全面的集成深度学习方法，用于阐明和描述蛋白质-核酸结合残基。

Commun Biol. 2024 Jun 3;7(1):679. doi: 10.1038/s42003-024-06332-0.

Prediction of anticancer drug sensitivity using an interpretable model guided by deep learning.利用深度学习指导的可解释模型预测抗癌药物敏感性。

BMC Bioinformatics. 2024 May 9;25(1):182. doi: 10.1186/s12859-024-05669-x.

DeepLncLoc: a deep learning framework for long non-coding RNA subcellular localization prediction based on subsequence embedding.DeepLncLoc：一种基于子序列嵌入的深度学习框架，用于长非编码 RNA 亚细胞定位预测。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab360.

DeepD2V: A Novel Deep Learning-Based Framework for Predicting Transcription Factor Binding Sites from Combined DNA Sequence.DeepD2V：一种基于深度学习的新型框架，用于从组合 DNA 序列预测转录因子结合位点。

Int J Mol Sci. 2021 May 24;22(11):5521. doi: 10.3390/ijms22115521.

Improved protein relative solvent accessibility prediction using deep multi-view feature learning framework.利用深度多视图特征学习框架提高蛋白质相对溶剂可及性预测。

Anal Biochem. 2021 Oct 15;631:114358. doi: 10.1016/j.ab.2021.114358. Epub 2021 Aug 31.

引用本文的文献

ProtLoc-GRPO: Cell line-specific subcellular localization prediction using a graph-based model and reinforcement learning.ProtLoc-GRPO：使用基于图的模型和强化学习进行细胞系特异性亚细胞定位预测。

bioRxiv. 2025 Jul 22:2025.07.17.665451. doi: 10.1101/2025.07.17.665451.

A deep ensemble framework for human essential gene prediction by integrating multi-omics data.一种通过整合多组学数据进行人类必需基因预测的深度集成框架。

Sci Rep. 2025 Jul 21;15(1):26407. doi: 10.1038/s41598-025-99164-9.

AttentionEP: Predicting essential proteins via fusion of multiscale features by attention mechanisms.AttentionEP：通过注意力机制融合多尺度特征预测必需蛋白质

Comput Struct Biotechnol J. 2024 Nov 29;23:4315-4323. doi: 10.1016/j.csbj.2024.11.039. eCollection 2024 Dec.

In-silico identification of therapeutic targets in pancreatic ductal adenocarcinoma using WGCNA and Trader.基于 WGCNA 和 Trader 的胰腺导管腺癌治疗靶点的计算机识别

Sci Rep. 2024 Oct 7;14(1):23292. doi: 10.1038/s41598-024-74252-4.

Hybrid framework for membrane protein type prediction based on the PSSM.基于 PSSM 的膜蛋白类型预测的混合框架。

Sci Rep. 2024 Jul 26;14(1):17156. doi: 10.1038/s41598-024-68163-7.

GP-HTNLoc: A graph prototype head-tail network-based model for multi-label subcellular localization prediction of ncRNAs.GP-HTNLoc：一种基于图原型头-尾网络的非编码RNA多标签亚细胞定位预测模型。

Comput Struct Biotechnol J. 2024 May 3;23:2034-2048. doi: 10.1016/j.csbj.2024.04.052. eCollection 2024 Dec.

'Bingo'-a large language model- and graph neural network-based workflow for the prediction of essential genes from protein data.'Bingo'——一个基于大语言模型和图神经网络的工作流程，用于从蛋白质数据中预测必需基因。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad472.

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad752.

本文引用的文献

Accurate Prediction of Human Essential Proteins Using Ensemble Deep Learning.使用集成深度学习准确预测人类必需蛋白质

IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3263-3271. doi: 10.1109/TCBB.2021.3122294. Epub 2022 Dec 8.

Role of NRF2 in Lung Cancer.NRF2 在肺癌中的作用。

Cells. 2021 Jul 24;10(8):1879. doi: 10.3390/cells10081879.

Dual proteome-scale networks reveal cell-specific remodeling of the human interactome.双重蛋白质组尺度网络揭示了人类相互作用组的细胞特异性重塑。

Cell. 2021 May 27;184(11):3022-3040.e28. doi: 10.1016/j.cell.2021.04.011. Epub 2021 May 6.

Mutations Predict Lung Cancer Radiation Resistance That Can Be Targeted by Glutaminase Inhibition.突变可预测肺癌放疗抵抗，谷氨酰胺酶抑制可靶向治疗。

Cancer Discov. 2020 Dec;10(12):1826-1841. doi: 10.1158/2159-8290.CD-20-0282. Epub 2020 Oct 18.

Project Score database: a resource for investigating cancer cell dependencies and prioritizing therapeutic targets.项目评分数据库：用于研究癌细胞依赖性和确定治疗靶点优先级的资源。

Nucleic Acids Res. 2021 Jan 8;49(D1):D1365-D1372. doi: 10.1093/nar/gkaa882.

DeeplyEssential: a deep neural network for predicting essential genes in microbes.深度必需：一种用于预测微生物必需基因的深度神经网络。

BMC Bioinformatics. 2020 Sep 30;21(Suppl 14):367. doi: 10.1186/s12859-020-03688-y.

DeepHE: Accurately predicting human essential genes based on deep learning.DeepHE：基于深度学习的人类必需基因精准预测。

PLoS Comput Biol. 2020 Sep 16;16(9):e1008229. doi: 10.1371/journal.pcbi.1008229. eCollection 2020 Sep.

Expression-based prediction of human essential genes and candidate lncRNAs in cancer cells.基于表达谱的人类必需基因和癌症细胞中候选 lncRNA 的预测。

Bioinformatics. 2021 Apr 20;37(3):396-403. doi: 10.1093/bioinformatics/btaa717.

Protein-protein interaction site prediction through combining local and global features with deep neural networks.通过结合局部和全局特征与深度神经网络进行蛋白质-蛋白质相互作用位点预测。

Bioinformatics. 2020 Feb 15;36(4):1114-1120. doi: 10.1093/bioinformatics/btz699.

Prioritization of cancer therapeutic targets using CRISPR-Cas9 screens.利用 CRISPR-Cas9 筛选技术对癌症治疗靶点进行优先级排序。

Nature. 2019 Apr;568(7753):511-516. doi: 10.1038/s41586-019-1103-9. Epub 2019 Apr 10.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

DeepCellEss：基于注意力机制的可解释深度学习的细胞系特异性必需蛋白预测。

DeepCellEss: cell line-specific essential protein prediction with attention-based interpretable deep learning.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献