Suppr超能文献

机器学习在高丰度细胞质蛋白表面特征定量分析中的应用:迈向基于人工智能的仿生学

Application of Machine Learning in the Quantitative Analysis of the Surface Characteristics of Highly Abundant Cytoplasmic Proteins: Toward AI-Based Biomimetics.

作者信息

Moon Jooa, Hu Guanghao, Hayashi Tomohiro

机构信息

Department of Materials Science and Engineering, School of Materials and Chemical Technology, Tokyo Institute of Technology, Yokohama 226-8502, Japan.

The Institute for Solid State Physics, The University of Tokyo, Kashiwa 277-0882, Japan.

出版信息

Biomimetics (Basel). 2024 Mar 6;9(3):162. doi: 10.3390/biomimetics9030162.

Abstract

Proteins in the crowded environment of human cells have often been studied regarding nonspecific interactions, misfolding, and aggregation, which may cause cellular malfunction and disease. Specifically, proteins with high abundance are more susceptible to these issues due to the law of mass action. Therefore, the surfaces of highly abundant cytoplasmic (HAC) proteins directly exposed to the environment can exhibit specific physicochemical, structural, and geometrical characteristics that reduce nonspecific interactions and adapt to the environment. However, the quantitative relationships between the overall surface descriptors still need clarification. Here, we used machine learning to identify HAC proteins using hydrophobicity, charge, roughness, secondary structures, and B-factor from the protein surfaces and quantified the contribution of each descriptor. First, several supervised learning algorithms were compared to solve binary classification problems for the surfaces of HAC and extracellular proteins. Then, logistic regression was used for the feature importance analysis of descriptors considering model performance (80.2% accuracy and 87.6% AUC) and interpretability. The HAC proteins showed positive correlations with negatively and positively charged areas but negative correlations with hydrophobicity, the B-factor, the proportion of beta structures, roughness, and the proportion of disordered regions. Finally, the details of each descriptor could be explained concerning adaptative surface strategies of HAC proteins to regulate nonspecific interactions, protein folding, flexibility, stability, and adsorption. This study presented a novel approach using various surface descriptors to identify HAC proteins and provided quantitative design rules for the surfaces well-suited to human cellular crowded environments.

摘要

在人类细胞的拥挤环境中,蛋白质常常被研究其非特异性相互作用、错误折叠和聚集,这些可能导致细胞功能障碍和疾病。具体而言,由于质量作用定律,高丰度蛋白质更容易出现这些问题。因此,直接暴露于环境中的高丰度细胞质(HAC)蛋白质的表面可能表现出特定的物理化学、结构和几何特征,以减少非特异性相互作用并适应环境。然而,整体表面描述符之间的定量关系仍需阐明。在这里,我们使用机器学习,通过蛋白质表面的疏水性、电荷、粗糙度、二级结构和B因子来识别HAC蛋白质,并量化每个描述符的贡献。首先,比较了几种监督学习算法,以解决HAC蛋白质和细胞外蛋白质表面的二元分类问题。然后,考虑到模型性能(准确率80.2%,AUC 87.6%)和可解释性,使用逻辑回归对描述符进行特征重要性分析。HAC蛋白质与带负电荷和正电荷的区域呈正相关,但与疏水性、B因子、β结构比例、粗糙度和无序区域比例呈负相关。最后,可以根据HAC蛋白质调节非特异性相互作用、蛋白质折叠、灵活性、稳定性和吸附的适应性表面策略来解释每个描述符的细节。本研究提出了一种使用各种表面描述符识别HAC蛋白质的新方法,并为适合人类细胞拥挤环境的表面提供了定量设计规则。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a849/10967800/68789f61b2a6/biomimetics-09-00162-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验