Recursion, Salt Lake City, Utah, United States of America.
Genentech, South San Francisco, California, United States of America.
PLoS Comput Biol. 2024 Oct 1;20(10):e1012463. doi: 10.1371/journal.pcbi.1012463. eCollection 2024 Oct.
The continued scaling of genetic perturbation technologies combined with high-dimensional assays such as cellular microscopy and RNA-sequencing has enabled genome-scale reverse-genetics experiments that go beyond single-endpoint measurements of growth or lethality. Datasets emerging from these experiments can be combined to construct perturbative "maps of biology", in which readouts from various manipulations (e.g., CRISPR-Cas9 knockout, CRISPRi knockdown, compound treatment) are placed in unified, relatable embedding spaces allowing for the generation of genome-scale sets of pairwise comparisons. These maps of biology capture known biological relationships and uncover new associations which can be used for downstream discovery tasks. Construction of these maps involves many technical choices in both experimental and computational protocols, motivating the design of benchmark procedures to evaluate map quality in a systematic, unbiased manner. Here, we (1) establish a standardized terminology for the steps involved in perturbative map building, (2) introduce key classes of benchmarks to assess the quality of such maps, (3) construct 18 maps from four genome-scale datasets employing different cell types, perturbation technologies, and data readout modalities, (4) generate benchmark metrics for the constructed maps and investigate the reasons for performance variations, and (5) demonstrate utility of these maps to discover new biology by suggesting roles for two largely uncharacterized genes.
遗传扰动技术的不断扩展,加上细胞显微镜和 RNA 测序等多维分析,使得全基因组反向遗传学实验得以超越生长或致死等单点测量。这些实验产生的数据集可以组合起来构建扰动“生物学图谱”,其中各种操作(如 CRISPR-Cas9 敲除、CRISPRi 敲低、化合物处理)的读数被放置在统一的、可关联的嵌入空间中,从而生成全基因组的成对比较集。这些生物学图谱捕捉到已知的生物学关系,并揭示新的关联,可用于下游发现任务。这些图谱的构建涉及实验和计算方案中的许多技术选择,这促使我们设计基准程序以系统、无偏的方式评估图谱的质量。在这里,我们:(1)为扰动图谱构建涉及的步骤建立标准化术语;(2)引入关键类别的基准来评估这些图谱的质量;(3)使用不同的细胞类型、扰动技术和数据读出模式,从四个全基因组数据集构建 18 个图谱;(4)为构建的图谱生成基准指标,并研究性能变化的原因;(5)通过为两个基本未知的基因提出作用,展示这些图谱在发现新生物学方面的实用性。