Suppr超能文献

dSPRINT:预测蛋白质结构域中 DNA、RNA、离子、肽和小分子相互作用的位点。

dSPRINT: predicting DNA, RNA, ion, peptide and small molecule interaction sites within protein domains.

机构信息

Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ 08544, USA.

Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA.

出版信息

Nucleic Acids Res. 2021 Jul 21;49(13):e78. doi: 10.1093/nar/gkab356.

Abstract

Domains are instrumental in facilitating protein interactions with DNA, RNA, small molecules, ions and peptides. Identifying ligand-binding domains within sequences is a critical step in protein function annotation, and the ligand-binding properties of proteins are frequently analyzed based upon whether they contain one of these domains. To date, however, knowledge of whether and how protein domains interact with ligands has been limited to domains that have been observed in co-crystal structures; this leaves approximately two-thirds of human protein domain families uncharacterized with respect to whether and how they bind DNA, RNA, small molecules, ions and peptides. To fill this gap, we introduce dSPRINT, a novel ensemble machine learning method for predicting whether a domain binds DNA, RNA, small molecules, ions or peptides, along with the positions within it that participate in these types of interactions. In stringent cross-validation testing, we demonstrate that dSPRINT has an excellent performance in uncovering ligand-binding positions and domains. We also apply dSPRINT to newly characterize the molecular functions of domains of unknown function. dSPRINT's predictions can be transferred from domains to sequences, enabling predictions about the ligand-binding properties of 95% of human genes. The dSPRINT framework and its predictions for 6503 human protein domains are freely available at http://protdomain.princeton.edu/dsprint.

摘要

结构域在促进蛋白质与 DNA、RNA、小分子、离子和肽的相互作用方面发挥着重要作用。在序列中识别配体结合结构域是蛋白质功能注释的关键步骤,并且经常根据蛋白质是否包含这些结构域之一来分析其配体结合特性。然而,到目前为止,关于蛋白质结构域是否以及如何与配体相互作用的知识仅限于在共晶结构中观察到的结构域;这使得大约三分之二的人类蛋白质结构域家族在是否以及如何与 DNA、RNA、小分子、离子和肽结合方面仍然没有得到描述。为了填补这一空白,我们引入了 dSPRINT,这是一种用于预测结构域是否与 DNA、RNA、小分子、离子或肽结合以及参与这些类型相互作用的结构域内位置的新型集成机器学习方法。在严格的交叉验证测试中,我们证明 dSPRINT 在揭示配体结合位置和结构域方面具有出色的性能。我们还应用 dSPRINT 对未知功能结构域的分子功能进行新的特征描述。dSPRINT 的预测可以从结构域转移到序列,从而可以预测 95%的人类基因的配体结合特性。dSPRINT 框架及其对 6503 个人类蛋白质结构域的预测可在 http://protdomain.princeton.edu/dsprint 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/533b/8287948/41d4425a8df8/gkab356fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验