Chan Zuckerberg Biohub, San Francisco, CA, USA.
Nat Methods. 2022 Aug;19(8):995-1003. doi: 10.1038/s41592-022-01541-z. Epub 2022 Jul 25.
Explaining the diversity and complexity of protein localization is essential to fully understand cellular architecture. Here we present cytoself, a deep-learning approach for fully self-supervised protein localization profiling and clustering. Cytoself leverages a self-supervised training scheme that does not require preexisting knowledge, categories or annotations. Training cytoself on images of 1,311 endogenously labeled proteins from the OpenCell database reveals a highly resolved protein localization atlas that recapitulates major scales of cellular organization, from coarse classes, such as nuclear and cytoplasmic, to the subtle localization signatures of individual protein complexes. We quantitatively validate cytoself's ability to cluster proteins into organelles and protein complexes, showing that cytoself outperforms previous self-supervised approaches. Moreover, to better understand the inner workings of our model, we dissect the emergent features from which our clustering is derived, interpret them in the context of the fluorescence images, and analyze the performance contributions of each component of our approach.
解释蛋白质定位的多样性和复杂性对于全面了解细胞结构至关重要。在这里,我们提出了 cytoself,这是一种用于完全自我监督的蛋白质定位分析和聚类的深度学习方法。Cytoself 利用一种自我监督的训练方案,不需要预先存在的知识、类别或注释。在 OpenCell 数据库中 1311 个内源标记蛋白的图像上训练 cytoself 揭示了一个高度解析的蛋白质定位图谱,该图谱再现了细胞组织的主要尺度,从粗分类,如核和细胞质,到单个蛋白复合物的微妙定位特征。我们定量验证了 cytoself 将蛋白质聚类到细胞器和蛋白复合物的能力,表明 cytoself 优于以前的自我监督方法。此外,为了更好地理解我们模型的内部工作原理,我们剖析了聚类所衍生的新兴特征,根据荧光图像对其进行解释,并分析我们方法的每个组件的性能贡献。