Suppr超能文献

蛋白质序列中核酸结合残基预测二十年进展

Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences.

作者信息

Basu Sushmita, Yu Jing, Kihara Daisuke, Kurgan Lukasz

机构信息

Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Richmond, VA 23284, United States.

Department of Biological Sciences, Purdue University, 915 Mitch Daniels Boulevard, West Lafayette, IN 47907, United States.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf016.

Abstract

Computational prediction of nucleic acid-binding residues in protein sequences is an active field of research, with over 80 methods that were released in the past 2 decades. We identify and discuss 87 sequence-based predictors that include dozens of recently published methods that are surveyed for the first time. We overview historical progress and examine multiple practical issues that include availability and impact of predictors, key features of their predictive models, and important aspects related to their training and assessment. We observe that the past decade has brought increased use of deep neural networks and protein language models, which contributed to substantial gains in the predictive performance. We also highlight advancements in vital and challenging issues that include cross-predictions between deoxyribonucleic acid (DNA)-binding and ribonucleic acid (RNA)-binding residues and targeting the two distinct sources of binding annotations, structure-based versus intrinsic disorder-based. The methods trained on the structure-annotated interactions tend to perform poorly on the disorder-annotated binding and vice versa, with only a few methods that target and perform well across both annotation types. The cross-predictions are a significant problem, with some predictors of DNA-binding or RNA-binding residues indiscriminately predicting interactions with both nucleic acid types. Moreover, we show that methods with web servers are cited substantially more than tools without implementation or with no longer working implementations, motivating the development and long-term maintenance of the web servers. We close by discussing future research directions that aim to drive further progress in this area.

摘要

蛋白质序列中核酸结合残基的计算预测是一个活跃的研究领域,在过去20年里有80多种方法问世。我们识别并讨论了87种基于序列的预测器,其中包括几十种最近发表的方法,这些方法是首次被调研。我们概述了历史进展,并研究了多个实际问题,包括预测器的可用性和影响、其预测模型的关键特征,以及与其训练和评估相关的重要方面。我们观察到,过去十年中深度神经网络和蛋白质语言模型的使用增加,这使得预测性能有了显著提升。我们还强调了在重要且具有挑战性的问题上取得的进展,包括脱氧核糖核酸(DNA)结合残基和核糖核酸(RNA)结合残基之间的交叉预测,以及针对两种不同的结合注释来源,即基于结构的注释和基于内在无序的注释。基于结构注释的相互作用训练的方法在基于无序注释的结合上往往表现不佳,反之亦然,只有少数方法能够针对这两种注释类型并在两者上都表现良好。交叉预测是一个重大问题,一些DNA结合或RNA结合残基的预测器会不加区分地预测与两种核酸类型的相互作用。此外,我们表明,拥有网络服务器的方法被引用的次数比没有实现或实现不再有效的工具多得多,这推动了网络服务器的开发和长期维护。最后,我们讨论了旨在推动该领域进一步发展的未来研究方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c867/11745544/08cbf255edd7/bbaf016f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验