Khwaja Emaad, Song Yun S, Agarunov Aaron, Huang Bo
UC Berkeley - UCSF Joint Bioengineering Graduate Program.
Computer Science Division, UC Berkeley, CA 94720.
Adv Neural Inf Process Syst. 2023 Dec;36:4899-4914.
We present CELL-E 2, a novel bidirectional transformer that can generate images depicting protein subcellular localization from the amino acid sequences (and ). Protein localization is a challenging problem that requires integrating sequence and image information, which most existing methods ignore. CELL-E 2 extends the work of CELL-E, not only capturing the spatial complexity of protein localization and produce probability estimates of localization atop a nucleus image, but also being able to generate sequences from images, enabling protein design. We train and finetune CELL-E 2 on two large-scale datasets of human proteins. We also demonstrate how to use CELL-E 2 to create hundreds of novel nuclear localization signals (NLS). Results and interactive demos are featured at https://bohuanglab.github.io/CELL-E_2/.
我们展示了CELL-E 2,这是一种新型的双向变压器,它可以从氨基酸序列生成描绘蛋白质亚细胞定位的图像。蛋白质定位是一个具有挑战性的问题,需要整合序列和图像信息,而大多数现有方法都忽略了这一点。CELL-E 2扩展了CELL-E的工作,不仅捕捉蛋白质定位的空间复杂性并在细胞核图像上生成定位概率估计,还能够从图像生成序列,从而实现蛋白质设计。我们在两个人类蛋白质的大规模数据集上训练和微调CELL-E 2。我们还展示了如何使用CELL-E 2创建数百个新型核定位信号(NLS)。结果和交互式演示见https://bohuanglab.github.io/CELL-E_2/ 。