Department of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States.
BASF Enzymes, San Diego, California 92121, United States.
ACS Synth Biol. 2020 Aug 21;9(8):2154-2161. doi: 10.1021/acssynbio.0c00219. Epub 2020 Jul 27.
Short (15-30 residue) chains of amino acids at the amino termini of expressed proteins known as signal peptides (SPs) specify secretion in living cells. We trained an attention-based neural network, the Transformer model, on data from all available organisms in Swiss-Prot to generate SP sequences. Experimental testing demonstrates that the model-generated SPs are functional: when appended to enzymes expressed in an industrial strain, the SPs lead to secreted activity that is competitive with industrially used SPs. Additionally, the model-generated SPs are diverse in sequence, sharing as little as 58% sequence identity to the closest known native signal peptide and 73% ± 9% on average.
短(15-30 个残基)氨基酸链在表达蛋白的氨基末端,称为信号肽(SP),指定活细胞中的分泌。我们在瑞士-Prot 中所有可用的生物数据上训练了基于注意力的神经网络,即 Transformer 模型,以生成 SP 序列。实验测试表明,该模型生成的 SP 是具有功能的:当添加到工业菌株中表达的酶时,SP 导致与工业上使用的 SP 具有竞争力的分泌活性。此外,模型生成的 SP 在序列上是多样化的,与最接近的已知天然信号肽的序列同一性最小为 58%,平均为 73%±9%。