Kang Seungpyo, Kim Minseon, Sun Jiwon, Lee Myeonghun, Min Kyoungmin
School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu 06978, Seoul, Republic of Korea.
School of Systems Biomedical Science, Soongsil University, 369 Sangdo-ro, Dongjak-gu 06978, Seoul, Republic of Korea.
ACS Biomater Sci Eng. 2023 Nov 13;9(11):6451-6463. doi: 10.1021/acsbiomaterials.3c01001. Epub 2023 Oct 16.
Protein aggregation occurs when misfolded or unfolded proteins physically bind together and can promote the development of various amyloid diseases. This study aimed to construct surrogate models for predicting protein aggregation via data-driven methods using two types of databases. First, an aggregation propensity score database was constructed by calculating the scores for protein structures in the Protein Data Bank using Aggrescan3D 2.0. Moreover, feature- and graph-based models for predicting protein aggregation have been developed by using this database. The graph-based model outperformed the feature-based model, resulting in an of 0.95, although it intrinsically required protein structures. Second, for the experimental data, a feature-based model was built using the Curated Protein Aggregation Database 2.0 to predict the aggregated intensity curves. In summary, this study suggests approaches that are more effective in predicting protein aggregation, depending on the type of descriptor and the database.
当错误折叠或未折叠的蛋白质物理结合在一起时,就会发生蛋白质聚集,这可能会促进各种淀粉样疾病的发展。本研究旨在通过使用两种类型的数据库,采用数据驱动的方法构建预测蛋白质聚集的替代模型。首先,通过使用Aggrescan3D 2.0计算蛋白质数据库中蛋白质结构的分数,构建了一个聚集倾向评分数据库。此外,利用该数据库开发了基于特征和基于图的蛋白质聚集预测模型。基于图的模型优于基于特征的模型,尽管它本质上需要蛋白质结构,但其马修斯相关系数为0.95。其次,对于实验数据,使用整理后的蛋白质聚集数据库2.0构建了一个基于特征的模型,以预测聚集强度曲线。总之,本研究提出了根据描述符类型和数据库,在预测蛋白质聚集方面更有效的方法。