Suppr超能文献

SOLeNNoID:一种用于蛋白质结构中螺线管残基检测的深度学习管道。

SOLeNNoID: a deep learning pipeline for solenoid residue detection in protein structures.

作者信息

Nikov Georgi I, Pretorius Daniella, Murray James W

机构信息

Life Sciences, Imperial College, London SW7 2AZ, United Kingdom.

出版信息

Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf415.

Abstract

MOTIVATION

Solenoid proteins, a subset of tandem repeat proteins, have structurally distinct, modular, and elongated architectures that differentiate them from globular proteins. These proteins play essential roles in diverse biological processes, including protein binding, enzymatic catalysis, ice binding, and nucleic acid interactions. Despite their biological significance and increasing commercial applications-such as in therapeutic engineered variants like DARPins and designed PPR proteins-accurate identification and annotation of solenoid structures remain challenging. Given that solenoid structures are more conserved than their sequences, recent advances in protein structure prediction suggest that structure-based solenoid detection methods are preferable to sequence-based ones.

RESULTS

We introduce SOLeNNoID, a deep-learning-based pipeline for predicting solenoid residues in protein structures. Our method employs a convolutional neural network architecture to analyse protein distance matrices, enabling accurate identification of solenoid-containing regions. SOLeNNoID covers all three solenoid subclasses: α-, α/β-, and β-solenoids. Comparative evaluation against existing structure-based methods demonstrates the superior performance of our approach. Applying SOLeNNoID to the entire Protein Data Bank led to a 71% increase in detected solenoid-containing entries compared to the gold-standard RepeatsDB database, significantly expanding the known solenoid protein repertoire.

AVAILABILITY AND IMPLEMENTATION

SOLeNNoID is implemented in Python and available on github at https://github.com/gnik2018/SOLeNNoID. The source code and pre-trained models are accessible under a free-software license. Training data are available on Zenodo at https://zenodo.org/records/14927497.

摘要

动机

螺线管蛋白是串联重复蛋白的一个子集,具有结构独特、模块化且细长的结构,这使其与球状蛋白有所区别。这些蛋白在多种生物学过程中发挥着重要作用,包括蛋白质结合、酶催化、冰结合以及核酸相互作用。尽管它们具有生物学意义且商业应用日益增加,如在治疗性工程变体(如设计锚蛋白重复结构域蛋白和设计PPR蛋白)中,但螺线管结构的准确识别和注释仍然具有挑战性。鉴于螺线管结构比其序列更保守,蛋白质结构预测的最新进展表明基于结构的螺线管检测方法优于基于序列的方法。

结果

我们引入了SOLeNNoID,这是一种基于深度学习的用于预测蛋白质结构中螺线管残基的流程。我们的方法采用卷积神经网络架构来分析蛋白质距离矩阵,从而能够准确识别包含螺线管的区域。SOLeNNoID涵盖了所有三个螺线管亚类:α - 、α/β - 和β - 螺线管。与现有的基于结构的方法进行比较评估,证明了我们方法的优越性能。将SOLeNNoID应用于整个蛋白质数据库,与金标准的RepeatsDB数据库相比,检测到的含螺线管条目的数量增加了71%,显著扩展了已知的螺线管蛋白库。

可用性和实现方式

SOLeNNoID用Python实现,可在github上获取,网址为https://github.com/gnik2018/SOLeNNoID。源代码和预训练模型可在自由软件许可下获取。训练数据可在Zenodo上获取,网址为https://zenodo.org/records/14927497。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1bf/12342502/3dfacf8efe33/btaf415f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验