Suppr超能文献

构建用于从头预测蛋白质结构的可变长度片段文库。

Construct a variable-length fragment library for de novo protein structure prediction.

机构信息

College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China.

出版信息

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac086.

Abstract

Although remarkable achievements, such as AlphaFold2, have been made in end-to-end structure prediction, fragment libraries remain essential for de novo protein structure prediction, which can help explore and understand the protein-folding mechanism. In this work, we developed a variable-length fragment library (VFlib). In VFlib, a master structure database was first constructed from the Protein Data Bank through sequence clustering. The hidden Markov model (HMM) profile of each protein in the master structure database was generated by HHsuite, and the secondary structure of each protein was calculated by DSSP. For the query sequence, the HMM-profile was first constructed. Then, variable-length fragments were retrieved from the master structure database through dynamically variable-length profile-profile comparison. A complete method for chopping the query HMM-profile during this process was proposed to obtain fragments with increased diversity. Finally, secondary structure information was used to further screen the retrieved fragments to generate the final fragment library of specific query sequence. The experimental results obtained with a set of 120 nonredundant proteins show that the global precision and coverage of the fragment library generated by VFlib were 55.04% and 94.95% at the RMSD cutoff of 1.5 Å, respectively. Compared with the benchmark method of NNMake, the global precision of our fragment library had increased by 62.89% with equivalent coverage. Furthermore, the fragments generated by VFlib and NNMake were used to predict structure models through fragment assembly. Controlled experimental results demonstrate that the average TM-score of VFlib was 16.00% higher than that of NNMake.

摘要

尽管在端到端结构预测方面取得了显著成就,如 AlphaFold2,但从头蛋白质结构预测仍然需要片段库,这有助于探索和理解蛋白质折叠机制。在这项工作中,我们开发了一种可变长度片段库(VFlib)。在 VFlib 中,首先通过序列聚类从蛋白质数据库中构建主结构数据库。HHsuite 生成主结构数据库中每个蛋白质的隐马尔可夫模型(HMM)轮廓,DSSP 计算每个蛋白质的二级结构。对于查询序列,首先构建 HMM 轮廓。然后,通过动态可变长度轮廓-轮廓比较从主结构数据库中检索可变长度片段。提出了一种在这个过程中切割查询 HMM 轮廓的完整方法,以获得多样性增加的片段。最后,使用二级结构信息进一步筛选检索到的片段,生成特定查询序列的最终片段库。使用一组 120 个非冗余蛋白质获得的实验结果表明,在 RMSD 截止值为 1.5 Å 时,VFlib 生成的片段库的全局精度和覆盖率分别为 55.04%和 94.95%。与 NNMake 的基准方法相比,我们的片段库的全局精度提高了 62.89%,而覆盖率相同。此外,使用 VFlib 和 NNMake 生成的片段通过片段组装来预测结构模型。对照实验结果表明,VFlib 的平均 TM 分数比 NNMake 高 16.00%。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验