Department of Medicine, University of California San Diego, La Jolla, CA, USA.
Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA.
Nature. 2021 Dec;600(7889):536-542. doi: 10.1038/s41586-021-04115-9. Epub 2021 Nov 24.
The cell is a multi-scale structure with modular organization across at least four orders of magnitude. Two central approaches for mapping this structure-protein fluorescent imaging and protein biophysical association-each generate extensive datasets, but of distinct qualities and resolutions that are typically treated separately. Here we integrate immunofluorescence images in the Human Protein Atlas with affinity purifications in BioPlex to create a unified hierarchical map of human cell architecture. Integration is achieved by configuring each approach as a general measure of protein distance, then calibrating the two measures using machine learning. The map, known as the multi-scale integrated cell (MuSIC 1.0), resolves 69 subcellular systems, of which approximately half are to our knowledge undocumented. Accordingly, we perform 134 additional affinity purifications and validate subunit associations for the majority of systems. The map reveals a pre-ribosomal RNA processing assembly and accessory factors, which we show govern rRNA maturation, and functional roles for SRRM1 and FAM120C in chromatin and RPS3A in splicing. By integration across scales, MuSIC increases the resolution of imaging while giving protein interactions a spatial dimension, paving the way to incorporate diverse types of data in proteome-wide cell maps.
细胞是一种具有模块化组织的多尺度结构,其组织跨越至少四个数量级。两种主要的方法用于绘制这种结构-蛋白质荧光成像和蛋白质生物物理关联-每一种方法都产生了广泛的数据集,但质量和分辨率明显不同,通常分别处理。在这里,我们将人类蛋白质图谱中的免疫荧光图像与 BioPlex 中的亲和纯化相结合,创建了一个统一的人类细胞结构层次图。通过将每种方法配置为蛋白质距离的一般度量,然后使用机器学习对两种度量进行校准,从而实现集成。该图谱被称为多尺度集成细胞(MuSIC 1.0),可解析 69 个亚细胞系统,其中大约一半据我们所知是没有记录的。因此,我们进行了 134 次额外的亲和纯化,并验证了大多数系统的亚基关联。该图谱揭示了一个核糖体前 RNA 处理组装和辅助因子,我们表明这些因子控制 rRNA 的成熟,以及 SRRM1 和 FAM120C 在染色质和 RPS3A 在剪接中的功能作用。通过跨尺度的整合,MuSIC 提高了成像的分辨率,同时赋予蛋白质相互作用一个空间维度,为在整个蛋白质组细胞图谱中纳入多种类型的数据铺平了道路。