Suppr超能文献

基于分子片段亚结构描述符的蛋白质二级结构预测

Prediction of Protein Secondary Structures Based on Substructural Descriptors of Molecular Fragments.

作者信息

Zakharov Oleg S, Rudik Anastasia V, Filimonov Dmitry A, Lagunin Alexey A

机构信息

Department of Bioinformatics, Pirogov Russian National Research Medical University, 117997 Moscow, Russia.

Department of Bioinformatics, Institute of Biomedical Chemistry, 119121 Moscow, Russia.

出版信息

Int J Mol Sci. 2024 Nov 21;25(23):12525. doi: 10.3390/ijms252312525.

Abstract

The accurate prediction of secondary structures of proteins (SSPs) is a critical challenge in molecular biology and structural bioinformatics. Despite recent advancements, this task remains complex and demands further exploration. This study presents a novel approach to SSP prediction using atom-centric substructural multilevel neighborhoods of atoms (MNA) descriptors for protein molecular fragments. A dataset comprising over 335,000 SSPs, annotated by the Dictionary of Secondary Structure in Proteins (DSSP) software from 37,000 proteins, was constructed from Protein Data Bank (PDB) records with a resolution of 2 Å or better. Protein fragments were converted into structural formulae using the RDKit Python package and stored in SD files using the MOL V3000 format. Classification sequence-structure-property relationships (SSPR) models were developed with varying levels of MNA descriptors and a Bayesian algorithm implemented in MultiPASS software. The average prediction accuracy (AUC) for eight SSP types, calculated via leave-one-out cross-validation, was 0.902. For independent test sets (ASTRAL and CB513 datasets), the best SSPR models achieved AUC, Q3, and Q8 values of 0.860, 77.32%, 70.92% and 0.889, 78.78%, 74.74%, respectively. Based on the created models, a freely available web application MNA-PSS-Pred was developed.

摘要

蛋白质二级结构(SSP)的准确预测是分子生物学和结构生物信息学中的一项关键挑战。尽管最近取得了进展,但这项任务仍然复杂,需要进一步探索。本研究提出了一种新颖的SSP预测方法,该方法使用蛋白质分子片段的以原子为中心的亚结构多级邻域(MNA)描述符。从蛋白质数据库(PDB)记录中构建了一个数据集,该数据集包含超过335,000个SSP,由蛋白质二级结构字典(DSSP)软件对37,000种蛋白质进行注释,分辨率为2 Å或更高。使用RDKit Python包将蛋白质片段转换为结构式,并使用MOL V3000格式存储在SD文件中。利用不同级别的MNA描述符和MultiPASS软件中实现的贝叶斯算法开发了分类序列-结构-属性关系(SSPR)模型。通过留一法交叉验证计算的八种SSP类型的平均预测准确率(AUC)为0.902。对于独立测试集(ASTRAL和CB513数据集),最佳的SSPR模型的AUC、Q3和Q8值分别为0.860、77.32%、70.92%和0.889、78.78%、74.74%。基于所创建的模型,开发了一个免费的网络应用程序MNA-PSS-Pred。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c42/11641695/0a1703192af8/ijms-25-12525-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验