• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于分子表面化学特征预测的蛋白质类别:细胞质和分泌蛋白的机器学习辅助分类。

Protein Classes Predicted by Molecular Surface Chemical Features: Machine Learning-Assisted Classification of Cytosol and Secreted Proteins.

机构信息

Department of Materials Science and Engineering, School of Materials Science and Chemical Technology, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama-shi, Kanagawa-ken 226-8502, Japan.

The Institute for Solid State Physics, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa, Chiba 277-0882, Japan.

出版信息

J Phys Chem B. 2024 Sep 5;128(35):8423-8436. doi: 10.1021/acs.jpcb.4c02461. Epub 2024 Aug 26.

DOI:10.1021/acs.jpcb.4c02461
PMID:39185763
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11382266/
Abstract

Chemical structures of protein surfaces govern intermolecular interaction, and protein functions include specific molecular recognition, transport, self-assembly, etc. Therefore, the relationship between the chemical structure and protein functions provides insights into the understanding of the mechanism underlying protein functions and developments of new biomaterials. In this study, we analyze protein surface features, including surface amino acid populations and secondary structure ratios, instead of entire sequences as input for the classifier, intending to provide deeper insights into the determination of protein classes (cytosol or secreted). We employed a random forest-based classifier for the prediction of protein locations. Our training and testing data sets consisting of secreted and cytosol proteins were constructed using filtered information from UniProt and 3D structures from AlphaFold. The classifier achieved a testing accuracy of 93.9% with a feature importance ranking and quantitative boundary values for the top three features. We discuss the significance of these features quantitatively and the hidden rules to determine the protein classes (cytosol or secreted).

摘要

蛋白质表面的化学结构决定了分子间的相互作用,而蛋白质的功能包括特异性分子识别、运输、自组装等。因此,化学结构与蛋白质功能之间的关系为理解蛋白质功能的机制和开发新型生物材料提供了线索。在这项研究中,我们分析了蛋白质表面的特征,包括表面氨基酸群体和二级结构比例,而不是将整个序列作为分类器的输入,旨在更深入地了解蛋白质类别(胞质或分泌)的决定因素。我们使用基于随机森林的分类器来预测蛋白质的位置。我们的训练和测试数据集由从 UniProt 筛选的信息和从 AlphaFold 获得的 3D 结构组成,包含分泌蛋白和胞质蛋白。该分类器在测试集上的准确率达到了 93.9%,并对前三个特征进行了特征重要性排名和定量边界值的分析。我们定量讨论了这些特征的意义以及决定蛋白质类别(胞质或分泌)的隐藏规则。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/a5829018c8bc/jp4c02461_0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/ada3fa7ef11f/jp4c02461_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/47430f0718e3/jp4c02461_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/45fb5dee4752/jp4c02461_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/fc651100d3ac/jp4c02461_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/8b4f902bb22f/jp4c02461_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/e1f15025b5f8/jp4c02461_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/ae248d4475df/jp4c02461_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/993f7bcb0d2c/jp4c02461_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/057aa3deafba/jp4c02461_0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/58b798a0e5d7/jp4c02461_0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/60739224726b/jp4c02461_0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/fb31b403b36a/jp4c02461_0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/4f83fb0921bf/jp4c02461_0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/a5829018c8bc/jp4c02461_0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/ada3fa7ef11f/jp4c02461_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/47430f0718e3/jp4c02461_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/45fb5dee4752/jp4c02461_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/fc651100d3ac/jp4c02461_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/8b4f902bb22f/jp4c02461_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/e1f15025b5f8/jp4c02461_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/ae248d4475df/jp4c02461_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/993f7bcb0d2c/jp4c02461_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/057aa3deafba/jp4c02461_0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/58b798a0e5d7/jp4c02461_0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/60739224726b/jp4c02461_0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/fb31b403b36a/jp4c02461_0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/4f83fb0921bf/jp4c02461_0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/a5829018c8bc/jp4c02461_0014.jpg

相似文献

1
Protein Classes Predicted by Molecular Surface Chemical Features: Machine Learning-Assisted Classification of Cytosol and Secreted Proteins.基于分子表面化学特征预测的蛋白质类别:细胞质和分泌蛋白的机器学习辅助分类。
J Phys Chem B. 2024 Sep 5;128(35):8423-8436. doi: 10.1021/acs.jpcb.4c02461. Epub 2024 Aug 26.
2
A two-stage approach towards protein secondary structure classification.两段式方法用于蛋白质二级结构分类。
Med Biol Eng Comput. 2020 Aug;58(8):1723-1737. doi: 10.1007/s11517-020-02194-w. Epub 2020 May 29.
3
Maximizing lipocalin prediction through balanced and diversified training set and decision fusion.通过平衡且多样化的训练集和决策融合实现脂蛋白预测最大化。
Comput Biol Chem. 2015 Dec;59 Pt A:101-10. doi: 10.1016/j.compbiolchem.2015.09.011. Epub 2015 Sep 28.
4
Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms.基于机器学习算法的二级结构和进化信息的结构蛋白折叠识别。
Comput Biol Chem. 2021 Apr;91:107456. doi: 10.1016/j.compbiolchem.2021.107456. Epub 2021 Feb 12.
5
AlphaFind: discover structure similarity across the proteome in AlphaFold DB.AlphaFind:在 AlphaFold DB 中发现蛋白质组中的结构相似性。
Nucleic Acids Res. 2024 Jul 5;52(W1):W182-W186. doi: 10.1093/nar/gkae397.
6
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合,以预测放射性肺损伤。
Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.
7
Significance of Sequence Features in Classification of Protein-Protein Interactions Using Machine Learning.基于机器学习的蛋白质-蛋白质相互作用分类中序列特征的意义。
Protein J. 2024 Feb;43(1):72-83. doi: 10.1007/s10930-023-10168-8. Epub 2023 Dec 19.
8
Engineering Aspects of Olfaction嗅觉的工程学方面
9
PSSNet-An Accurate Super-Secondary Structure for Protein Segmentation.PSSNet-一种用于蛋白质分割的精确超二级结构。
Int J Mol Sci. 2022 Nov 26;23(23):14813. doi: 10.3390/ijms232314813.
10
Prediction Enhancement of Residue Real-Value Relative Accessible Surface Area in Transmembrane Helical Proteins by Solving the Output Preference Problem of Machine Learning-Based Predictors.通过解决基于机器学习预测器的输出偏好问题,提高跨膜螺旋蛋白残基真实相对可及表面积的预测能力。
J Chem Inf Model. 2015 Nov 23;55(11):2464-74. doi: 10.1021/acs.jcim.5b00246. Epub 2015 Oct 20.

本文引用的文献

1
Recent Advances in the Prediction of Subcellular Localization of Proteins and Related Topics.蛋白质亚细胞定位预测及相关主题的最新进展
Front Bioinform. 2022 May 19;2:910531. doi: 10.3389/fbinf.2022.910531. eCollection 2022.
2
Prediction of Serum Adsorption onto Polymer Brush Films by Machine Learning.基于机器学习的聚合物刷膜对血清吸附的预测。
ACS Biomater Sci Eng. 2022 Sep 12;8(9):3765-3772. doi: 10.1021/acsbiomaterials.2c00441. Epub 2022 Jul 29.
3
Self-supervised deep learning encodes high-resolution features of protein subcellular localization.
自监督深度学习编码了蛋白质亚细胞定位的高分辨率特征。
Nat Methods. 2022 Aug;19(8):995-1003. doi: 10.1038/s41592-022-01541-z. Epub 2022 Jul 25.
4
DeepLoc 2.0: multi-label subcellular localization prediction using protein language models.DeepLoc 2.0:使用蛋白质语言模型进行多标签亚细胞定位预测。
Nucleic Acids Res. 2022 Jul 5;50(W1):W228-W234. doi: 10.1093/nar/gkac278.
5
ProteinBERT: a universal deep-learning model of protein sequence and function.蛋白质 BERT:一种通用的蛋白质序列和功能深度学习模型。
Bioinformatics. 2022 Apr 12;38(8):2102-2110. doi: 10.1093/bioinformatics/btac020.
6
Protein structure predictions to atomic accuracy with AlphaFold.使用AlphaFold进行原子精度的蛋白质结构预测。
Nat Methods. 2022 Jan;19(1):11-12. doi: 10.1038/s41592-021-01362-6.
7
Computational methods for protein localization prediction.蛋白质定位预测的计算方法。
Comput Struct Biotechnol J. 2021 Oct 19;19:5834-5844. doi: 10.1016/j.csbj.2021.10.023. eCollection 2021.
8
Protein- and Cell-Resistance of Zwitterionic Peptide-Based Self-Assembled Monolayers: Anti-Biofouling Tests and Surface Force Analysis.基于两性离子肽的自组装单分子层的蛋白质和细胞抗性:抗生物污损测试与表面力分析
Front Chem. 2021 Oct 6;9:748017. doi: 10.3389/fchem.2021.748017. eCollection 2021.
9
Highly accurate protein structure prediction for the human proteome.高精准度的人类蛋白质组蛋白结构预测。
Nature. 2021 Aug;596(7873):590-596. doi: 10.1038/s41586-021-03828-1. Epub 2021 Jul 22.
10
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.