Suppr超能文献

基于序列信息预测蛋白质功能的深度学习程序。

Deep learning program to predict protein functions based on sequence information.

作者信息

Ko Chang Woo, Huh June, Park Jong-Wan

机构信息

Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea.

Department of Pharmacology, Seoul National University College of Medicine, Seoul, Republic of Korea.

出版信息

MethodsX. 2022 Jan 15;9:101622. doi: 10.1016/j.mex.2022.101622. eCollection 2022.

Abstract

Deep learning technologies have been adopted to predict the functions of newly identified proteins in silico. However, most current models are not suitable for poorly characterized proteins because they require diverse information on target proteins. We designed a binary classification deep learning program requiring only sequence information. This program was named 'FUTUSA' (function teller using sequence alone). It applied sequence segmentation during the sequence feature extraction process, by a convolution neural network, to train the regional sequence patterns and their relationship. This segmentation process improved the predictive performance by 49% than the full-length process. Compared with a baseline method, our approach achieved higher performance in predicting oxidoreductase activity. In addition, FUTUSA also showed dramatic performance in predicting acetyltransferase and demethylase activities. Next, we tested the possibility that FUTUSA can predict the functional consequence of point mutation. After trained for monooxygenase activity, FUTUSA successfully predicted the impact of point mutations on phenylalanine hydroxylase, which is responsible for an inherited metabolic disease PKU. This deep-learning program can be used as the first-step tool for characterizing newly identified or poorly studied proteins.•We proposed new deep learning program to predict protein functions in silico that requires nothing more than the protein sequence information.•Due to application of sequence segmentation, the efficiency of prediction is improved.•This method makes prediction of the clinical impact of mutations or polymorphisms possible.

摘要

深度学习技术已被用于在计算机上预测新鉴定蛋白质的功能。然而,目前大多数模型不适用于特征描述不佳的蛋白质,因为它们需要目标蛋白质的各种信息。我们设计了一个仅需要序列信息的二元分类深度学习程序。这个程序被命名为“FUTUSA”(仅使用序列的功能预测器)。它在序列特征提取过程中通过卷积神经网络应用序列分割,以训练区域序列模式及其关系。与全长处理相比,这种分割过程将预测性能提高了49%。与基线方法相比,我们的方法在预测氧化还原酶活性方面表现更优。此外,FUTUSA在预测乙酰转移酶和去甲基酶活性方面也表现出色。接下来,我们测试了FUTUSA预测点突变功能后果的可能性。在针对单加氧酶活性进行训练后,FUTUSA成功预测了点突变对苯丙氨酸羟化酶的影响,该酶与遗传性代谢疾病苯丙酮尿症有关。这个深度学习程序可以用作表征新鉴定或研究不足的蛋白质的第一步工具。

•我们提出了一种新的深度学习程序,用于在计算机上预测蛋白质功能,该程序仅需要蛋白质序列信息。

•由于应用了序列分割,预测效率得到提高。

•这种方法使预测突变或多态性的临床影响成为可能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5562/8790617/79884619cb9a/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验