• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于识别天然蛋白和非天然蛋白的蛋白质序列适合度函数。

A protein sequence fitness function for identifying natural and nonnatural proteins.

机构信息

Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, Yokohama, Kanagawa, Japan.

出版信息

Proteins. 2020 Oct;88(10):1271-1284. doi: 10.1002/prot.25900. Epub 2020 May 28.

DOI:10.1002/prot.25900
PMID:32415863
Abstract

The infinitesimally small sequence space naturally scouted in the millions of years of evolution suggests that the natural proteins are constrained by some functional prerequisites and should differ from randomly generated sequences. We have developed a protein sequence fitness scoring function that implements sequence and corresponding secondary structural information at tripeptide levels to differentiate natural and nonnatural proteins. The proposed fitness function is extensively validated on a dataset of about 210 000 natural and nonnatural protein sequences and benchmarked with existing methods for differentiating natural and nonnatural proteins. The high sensitivity, specificity, and percentage accuracy (0.81%, 0.95%, and 91% respectively) of the fitness function demonstrates its potential application for sampling the protein sequences with higher probability of mimicking natural proteins. Moreover, the four major classes of proteins (α proteins, β proteins, α/β proteins, and α + β proteins) are separately analyzed and β proteins are found to score slightly lower as compared to other classes. Further, an analysis of about 250 designed proteins (adopted from previously reported cases) helped to define the boundaries for sampling the ideal protein sequences. The protein sequence characterization aided by the proposed fitness function could facilitate the exploration of new perspectives in the design of novel functional proteins.

摘要

在数百万年的进化中,自然探索了无穷小的序列空间,这表明天然蛋白质受到一些功能前提的限制,并且应该与随机生成的序列不同。我们已经开发了一种蛋白质序列适应性评分函数,该函数在三肽水平上实现了序列和相应的二级结构信息,以区分天然和非天然蛋白质。该拟合函数在大约 210000 个天然和非天然蛋白质序列的数据集上进行了广泛验证,并与现有方法进行了基准测试,以区分天然和非天然蛋白质。拟合函数的高灵敏度、特异性和准确率(分别为 0.81%、0.95%和 91%)表明,它具有潜在的应用价值,可以对更有可能模拟天然蛋白质的蛋白质序列进行抽样。此外,还对四大类蛋白质(α 蛋白、β 蛋白、α/β 蛋白和 α+β 蛋白)进行了单独分析,发现β 蛋白的得分略低于其他类别。此外,对大约 250 个已设计蛋白质(取自先前报道的案例)的分析有助于定义采样理想蛋白质序列的边界。拟议的拟合函数可以辅助对蛋白质序列进行特征描述,从而促进在设计新型功能蛋白质方面探索新的视角。

相似文献

1
A protein sequence fitness function for identifying natural and nonnatural proteins.一种用于识别天然蛋白和非天然蛋白的蛋白质序列适合度函数。
Proteins. 2020 Oct;88(10):1271-1284. doi: 10.1002/prot.25900. Epub 2020 May 28.
2
Templates in protein de novo design.蛋白质从头设计中的模板
J Biotechnol. 1995 Jul 31;41(2-3):197-210. doi: 10.1016/0168-1656(95)00010-n.
3
Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability.利用综合统计能量函数进行蛋白质设计,并通过实验选择进行折叠能力增强。
Nat Commun. 2014 Oct 27;5:5330. doi: 10.1038/ncomms6330.
4
ProDCoNN: Protein design using a convolutional neural network.ProDCoNN:使用卷积神经网络进行蛋白质设计。
Proteins. 2020 Jul;88(7):819-829. doi: 10.1002/prot.25868. Epub 2020 Jan 6.
5
Recurring sequence-structure motifs in (βα)-barrel proteins and experimental optimization of a chimeric protein designed based on such motifs.(βα)-桶状蛋白中的重复序列结构基序和基于此类基序设计的嵌合蛋白的实验优化。
Biochim Biophys Acta Proteins Proteom. 2017 Feb;1865(2):165-175. doi: 10.1016/j.bbapap.2016.11.001. Epub 2016 Nov 9.
6
Protein design by sampling an undirected graphical model of residue constraints.通过对残基约束的无向图形模型进行采样来进行蛋白质设计。
IEEE/ACM Trans Comput Biol Bioinform. 2009 Jul-Sep;6(3):506-16. doi: 10.1109/TCBB.2008.124.
7
Role of foldability and stability in designing real protein sequences.折叠能力和稳定性在设计真实蛋白质序列中的作用。
Phys Chem Chem Phys. 2011 May 28;13(20):9223-31. doi: 10.1039/c0cp02973d. Epub 2011 Apr 6.
8
Advances in computational protein design.计算蛋白质设计的进展。
Curr Opin Struct Biol. 2004 Aug;14(4):487-94. doi: 10.1016/j.sbi.2004.06.002.
9
Can natural proteins designed with 'inverted' peptide sequences adopt native-like protein folds?设计有“反向”肽序列的天然蛋白质能否采用类似天然的蛋白质折叠结构?
PLoS One. 2014 Sep 11;9(9):e107647. doi: 10.1371/journal.pone.0107647. eCollection 2014.
10
Thoroughly sampling sequence space: large-scale protein design of structural ensembles.全面采样序列空间:结构集合的大规模蛋白质设计
Protein Sci. 2002 Dec;11(12):2804-13. doi: 10.1110/ps.0203902.

引用本文的文献

1
Benchmarking protein language models for protein crystallization.用于蛋白质结晶的蛋白质语言模型基准测试。
Sci Rep. 2025 Jan 18;15(1):2381. doi: 10.1038/s41598-025-86519-5.
2
Role of environmental specificity in CASP results.环境特异性在 CASP 结果中的作用。
BMC Bioinformatics. 2023 Nov 11;24(1):425. doi: 10.1186/s12859-023-05559-8.
3
Biomimetic Construction of Artificial Selenoenzymes.人工硒酶的仿生构建
Biomimetics (Basel). 2023 Jan 28;8(1):54. doi: 10.3390/biomimetics8010054.
4
An integrated protein structure fitness scoring approach for identifying native-like model structures.一种用于识别类天然模型结构的综合蛋白质结构适应性评分方法。
Comput Struct Biotechnol J. 2022 Nov 17;20:6467-6472. doi: 10.1016/j.csbj.2022.11.032. eCollection 2022.
5
Obtaining protein foldability information from computational models of AlphaFold2 and RoseTTAFold.从AlphaFold2和RoseTTAFold的计算模型中获取蛋白质折叠信息。
Comput Struct Biotechnol J. 2022 Aug 17;20:4481-4489. doi: 10.1016/j.csbj.2022.08.034. eCollection 2022.
6
A novel structure-based approach for identification of vertebrate susceptibility to SARS-CoV-2: Implications for future surveillance programmes.一种基于新型结构的鉴定脊椎动物易感性的方法:对未来监测计划的启示。
Environ Res. 2022 Sep;212(Pt C):113303. doi: 10.1016/j.envres.2022.113303. Epub 2022 Apr 20.
7
Tools for Characterizing Proteins: Circular Variance, Mutual Proximity, Chameleon Sequences, and Subsequence Propensities.用于蛋白质特征分析的工具:循环方差、互近性、变色龙序列和亚序列倾向。
Methods Mol Biol. 2022;2405:39-61. doi: 10.1007/978-1-0716-1855-4_2.
8
Hierarchical Structure of Protein Sequence.蛋白质序列的层次结构。
Int J Mol Sci. 2021 Aug 3;22(15):8339. doi: 10.3390/ijms22158339.