Department of Bioengineering, Stanford University, Stanford, California, USA; email:
Department of Medicine, University of California San Diego, La Jolla, California, USA; email:
Annu Rev Biomed Data Sci. 2024 Aug;7(1):369-389. doi: 10.1146/annurev-biodatasci-102423-113534. Epub 2024 Jul 24.
While the primary sequences of human proteins have been cataloged for over a decade, determining how these are organized into a dynamic collection of multiprotein assemblies, with structures and functions spanning biological scales, is an ongoing venture. Systematic and data-driven analyses of these higher-order structures are emerging, facilitating the discovery and understanding of cellular phenotypes. At present, knowledge of protein localization and function has been primarily derived from manual annotation and curation in resources such as the Gene Ontology, which are biased toward richly annotated genes in the literature. Here, we envision a future powered by data-driven mapping of protein assemblies. These maps can capture and decode cellular functions through the integration of protein expression, localization, and interaction data across length scales and timescales. In this review, we focus on progress toward constructing integrated cell maps that accelerate the life sciences and translational research.
虽然人类蛋白质的一级序列已经被编目超过十年,但确定这些序列如何组织成一个动态的多蛋白组装体集合,具有跨越生物尺度的结构和功能,这是一个持续的研究课题。对这些更高阶结构的系统和数据驱动分析正在出现,这有助于发现和理解细胞表型。目前,蛋白质定位和功能的知识主要是从 Gene Ontology 等资源中手动注释和整理中获得的,这些资源偏向于文献中注释丰富的基因。在这里,我们设想一个由数据驱动的蛋白质组装体映射驱动的未来。通过整合跨长度和时间尺度的蛋白质表达、定位和相互作用数据,这些图谱可以捕捉和解码细胞功能。在这篇综述中,我们重点介绍了构建集成细胞图谱的进展,这些图谱可以加速生命科学和转化研究。