Suppr超能文献

基于分子表面化学特征预测的蛋白质类别:细胞质和分泌蛋白的机器学习辅助分类。

Protein Classes Predicted by Molecular Surface Chemical Features: Machine Learning-Assisted Classification of Cytosol and Secreted Proteins.

机构信息

Department of Materials Science and Engineering, School of Materials Science and Chemical Technology, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama-shi, Kanagawa-ken 226-8502, Japan.

The Institute for Solid State Physics, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa, Chiba 277-0882, Japan.

出版信息

J Phys Chem B. 2024 Sep 5;128(35):8423-8436. doi: 10.1021/acs.jpcb.4c02461. Epub 2024 Aug 26.

Abstract

Chemical structures of protein surfaces govern intermolecular interaction, and protein functions include specific molecular recognition, transport, self-assembly, etc. Therefore, the relationship between the chemical structure and protein functions provides insights into the understanding of the mechanism underlying protein functions and developments of new biomaterials. In this study, we analyze protein surface features, including surface amino acid populations and secondary structure ratios, instead of entire sequences as input for the classifier, intending to provide deeper insights into the determination of protein classes (cytosol or secreted). We employed a random forest-based classifier for the prediction of protein locations. Our training and testing data sets consisting of secreted and cytosol proteins were constructed using filtered information from UniProt and 3D structures from AlphaFold. The classifier achieved a testing accuracy of 93.9% with a feature importance ranking and quantitative boundary values for the top three features. We discuss the significance of these features quantitatively and the hidden rules to determine the protein classes (cytosol or secreted).

摘要

蛋白质表面的化学结构决定了分子间的相互作用,而蛋白质的功能包括特异性分子识别、运输、自组装等。因此,化学结构与蛋白质功能之间的关系为理解蛋白质功能的机制和开发新型生物材料提供了线索。在这项研究中,我们分析了蛋白质表面的特征,包括表面氨基酸群体和二级结构比例,而不是将整个序列作为分类器的输入,旨在更深入地了解蛋白质类别(胞质或分泌)的决定因素。我们使用基于随机森林的分类器来预测蛋白质的位置。我们的训练和测试数据集由从 UniProt 筛选的信息和从 AlphaFold 获得的 3D 结构组成,包含分泌蛋白和胞质蛋白。该分类器在测试集上的准确率达到了 93.9%,并对前三个特征进行了特征重要性排名和定量边界值的分析。我们定量讨论了这些特征的意义以及决定蛋白质类别(胞质或分泌)的隐藏规则。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8139/11382266/ada3fa7ef11f/jp4c02461_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验