Suppr超能文献

通过学习残基简洁的局部环境来实现准确高效的蛋白质序列设计。

Accurate and efficient protein sequence design through learning concise local environment of residues.

机构信息

Key Lab of Intelligent Information Processing, SKLP, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.

University of Chinese Academy of Sciences, Beijing 100110, China.

出版信息

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad122.

Abstract

MOTIVATION

Computational protein sequence design has been widely applied in rational protein engineering and increasing the design accuracy and efficiency is highly desired.

RESULTS

Here, we present ProDESIGN-LE, an accurate and efficient approach to protein sequence design. ProDESIGN-LE adopts a concise but informative representation of the residue's local environment and trains a transformer to learn the correlation between local environment of residues and their amino acid types. For a target backbone structure, ProDESIGN-LE uses the transformer to assign an appropriate residue type for each position based on its local environment within this structure, eventually acquiring a designed sequence with all residues fitting well with their local environments. We applied ProDESIGN-LE to design sequences for 68 naturally occurring and 129 hallucinated proteins within 20 s per protein on average. The designed proteins have their predicted structures perfectly resembling the target structures with a state-of-the-art average TM-score exceeding 0.80. We further experimentally validated ProDESIGN-LE by designing five sequences for an enzyme, chloramphenicol O-acetyltransferase type III (CAT III), and recombinantly expressing the proteins in Escherichia coli. Of these proteins, three exhibited excellent solubility, and one yielded monomeric species with circular dichroism spectra consistent with the natural CAT III protein.

AVAILABILITY AND IMPLEMENTATION

The source code of ProDESIGN-LE is available at https://github.com/bigict/ProDESIGN-LE.

摘要

动机

计算蛋白质序列设计已广泛应用于理性蛋白质工程,提高设计精度和效率是非常需要的。

结果

在这里,我们提出了 ProDESIGN-LE,这是一种准确高效的蛋白质序列设计方法。ProDESIGN-LE 采用简洁但信息量丰富的残基局部环境表示,并训练一个转换器来学习残基局部环境与其氨基酸类型之间的相关性。对于目标骨架结构,ProDESIGN-LE 使用转换器根据其在该结构中的局部环境为每个位置分配适当的残基类型,最终获得所有残基都与其局部环境良好匹配的设计序列。我们应用 ProDESIGN-LE 设计了 68 个天然存在的和 129 个虚幻的蛋白质的序列,平均每个蛋白质的设计时间为 20 秒。设计的蛋白质具有与目标结构完美匹配的预测结构,具有最先进的平均 TM 评分超过 0.80。我们通过为一种酶,氯霉素 O-乙酰基转移酶 III(CAT III)设计五个序列,并在大肠杆菌中重组表达这些蛋白质,进一步实验验证了 ProDESIGN-LE。这些蛋白质中,有三个表现出极好的可溶性,一个产生了与天然 CAT III 蛋白一致的圆二色光谱的单体物种。

可用性和实现

ProDESIGN-LE 的源代码可在 https://github.com/bigict/ProDESIGN-LE 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bc7/10027430/14f22bf218ad/btad122f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验