基于序列信息预测蛋白质功能的深度学习程序。

Deep learning program to predict protein functions based on sequence information.

作者信息

Ko Chang Woo, Huh June, Park Jong-Wan

机构信息

Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea.

Department of Pharmacology, Seoul National University College of Medicine, Seoul, Republic of Korea.

出版信息

MethodsX. 2022 Jan 15;9:101622. doi: 10.1016/j.mex.2022.101622. eCollection 2022.

DOI:10.1016/j.mex.2022.101622

PMID:35111575

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8790617/

Abstract

Deep learning technologies have been adopted to predict the functions of newly identified proteins in silico. However, most current models are not suitable for poorly characterized proteins because they require diverse information on target proteins. We designed a binary classification deep learning program requiring only sequence information. This program was named 'FUTUSA' (function teller using sequence alone). It applied sequence segmentation during the sequence feature extraction process, by a convolution neural network, to train the regional sequence patterns and their relationship. This segmentation process improved the predictive performance by 49% than the full-length process. Compared with a baseline method, our approach achieved higher performance in predicting oxidoreductase activity. In addition, FUTUSA also showed dramatic performance in predicting acetyltransferase and demethylase activities. Next, we tested the possibility that FUTUSA can predict the functional consequence of point mutation. After trained for monooxygenase activity, FUTUSA successfully predicted the impact of point mutations on phenylalanine hydroxylase, which is responsible for an inherited metabolic disease PKU. This deep-learning program can be used as the first-step tool for characterizing newly identified or poorly studied proteins.•We proposed new deep learning program to predict protein functions in silico that requires nothing more than the protein sequence information.•Due to application of sequence segmentation, the efficiency of prediction is improved.•This method makes prediction of the clinical impact of mutations or polymorphisms possible.

摘要

深度学习技术已被用于在计算机上预测新鉴定蛋白质的功能。然而，目前大多数模型不适用于特征描述不佳的蛋白质，因为它们需要目标蛋白质的各种信息。我们设计了一个仅需要序列信息的二元分类深度学习程序。这个程序被命名为“FUTUSA”（仅使用序列的功能预测器）。它在序列特征提取过程中通过卷积神经网络应用序列分割，以训练区域序列模式及其关系。与全长处理相比，这种分割过程将预测性能提高了49%。与基线方法相比，我们的方法在预测氧化还原酶活性方面表现更优。此外，FUTUSA在预测乙酰转移酶和去甲基酶活性方面也表现出色。接下来，我们测试了FUTUSA预测点突变功能后果的可能性。在针对单加氧酶活性进行训练后，FUTUSA成功预测了点突变对苯丙氨酸羟化酶的影响，该酶与遗传性代谢疾病苯丙酮尿症有关。这个深度学习程序可以用作表征新鉴定或研究不足的蛋白质的第一步工具。

•我们提出了一种新的深度学习程序，用于在计算机上预测蛋白质功能，该程序仅需要蛋白质序列信息。

•由于应用了序列分割，预测效率得到提高。

•这种方法使预测突变或多态性的临床影响成为可能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5562/8790617/79884619cb9a/ga1.jpg

相似文献

Deep learning program to predict protein functions based on sequence information.

MethodsX. 2022 Jan 15;9:101622. doi: 10.1016/j.mex.2022.101622. eCollection 2022.

tRNA-DL: A Deep Learning Approach to Improve tRNAscan-SE Prediction Results.

Hum Hered. 2018;83(3):163-172. doi: 10.1159/000493215. Epub 2019 Jan 25.

Automatic prostate segmentation using deep learning on clinically diverse 3D transrectal ultrasound images.

Med Phys. 2020 Jun;47(6):2413-2426. doi: 10.1002/mp.14134. Epub 2020 Apr 8.

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.

Predicting drug-target interaction network using deep learning model.

Comput Biol Chem. 2019 Jun;80:90-101. doi: 10.1016/j.compbiolchem.2019.03.016. Epub 2019 Mar 25.

Automatic Inside Point Localization with Deep Reinforcement Learning for Interactive Object Segmentation.

Sensors (Basel). 2021 Sep 11;21(18):6100. doi: 10.3390/s21186100.

DeepCys: Structure-based multiple cysteine function prediction method trained on deep neural network: Case study on domains of unknown functions belonging to COX2 domains.

Proteins. 2021 Jul;89(7):745-761. doi: 10.1002/prot.26056. Epub 2021 Feb 22.

Deep learning-based cardiac cine segmentation: Transfer learning application to 7T ultrahigh-field MRI.

Magn Reson Med. 2021 Oct;86(4):2179-2191. doi: 10.1002/mrm.28822. Epub 2021 May 18.

IDH1 mutation prediction using MR-based radiomics in glioblastoma: comparison between manual and fully automated deep learning-based approach of tumor segmentation.

Eur J Radiol. 2020 Jul;128:109031. doi: 10.1016/j.ejrad.2020.109031. Epub 2020 Apr 30.

引用本文的文献

A multimodal model for protein function prediction.

Sci Rep. 2025 Mar 26;15(1):10465. doi: 10.1038/s41598-025-94612-y.

Prediction of inhibitory peptides against E.coli with desired MIC value.

Sci Rep. 2025 Feb 8;15(1):4672. doi: 10.1038/s41598-025-86638-z.

Deep learning methods for protein function prediction.

Proteomics. 2025 Jan;25(1-2):e2300471. doi: 10.1002/pmic.202300471. Epub 2024 Jul 12.

MAN-C: A masked autoencoder neural cryptography based encryption scheme for CT scan images.

MethodsX. 2024 Apr 28;12:102738. doi: 10.1016/j.mex.2024.102738. eCollection 2024 Jun.

Diffusion models in bioinformatics and computational biology.

Nat Rev Bioeng. 2024 Feb;2(2):136-154. doi: 10.1038/s44222-023-00114-9. Epub 2023 Oct 27.

本文引用的文献

SDN2GO: An Integrated Deep Learning Model for Protein Function Prediction.

Front Bioeng Biotechnol. 2020 Apr 29;8:391. doi: 10.3389/fbioe.2020.00391. eCollection 2020.

UDSMProt: universal deep sequence models for protein classification.

Bioinformatics. 2020 Apr 15;36(8):2401-2409. doi: 10.1093/bioinformatics/btaa003.

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Genome Biol. 2019 Nov 19;20(1):244. doi: 10.1186/s13059-019-1835-8.

DeepGOPlus: improved protein function prediction from sequence.

Bioinformatics. 2020 Jan 15;36(2):422-429. doi: 10.1093/bioinformatics/btz595.

Structure of full-length human phenylalanine hydroxylase in complex with tetrahydrobiopterin.

Proc Natl Acad Sci U S A. 2019 Jun 4;116(23):11229-11234. doi: 10.1073/pnas.1902639116. Epub 2019 May 22.

Biophysical characterization of full-length human phenylalanine hydroxylase provides a deeper understanding of its quaternary structure equilibrium.

J Biol Chem. 2019 Jun 28;294(26):10131-10145. doi: 10.1074/jbc.RA119.008294. Epub 2019 May 10.

Amino Acid Encoding Methods for Protein Sequences: A Comprehensive Review and Assessment.

IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1918-1931. doi: 10.1109/TCBB.2019.2911677. Epub 2020 Dec 8.

Deep Robust Framework for Protein Function Prediction Using Variable-Length Protein Sequences.

IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1648-1659. doi: 10.1109/TCBB.2019.2911609. Epub 2019 Apr 16.

DeepFunc: A Deep Learning Framework for Accurate Prediction of Protein Functions from Protein Sequences and Interactions.

Proteomics. 2019 Jun;19(12):e1900019. doi: 10.1002/pmic.201900019. Epub 2019 May 27.

[Characteristics of PAH gene variants among 113 phenylketonuria patients from Henan Province].

Zhonghua Yi Xue Yi Chuan Xue Za Zhi. 2018 Dec 10;35(6):791-795. doi: 10.3760/cma.j.issn.1003-9406.2018.06.003.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于序列信息预测蛋白质功能的深度学习程序。

Deep learning program to predict protein functions based on sequence information.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献