Imaging Platform, Broad Institute of MIT and Harvard, 02142, Cambridge, MA, USA.
Nat Commun. 2024 Aug 2;15(1):6516. doi: 10.1038/s41467-024-50613-5.
High-throughput image-based profiling platforms are powerful technologies capable of collecting data from billions of cells exposed to thousands of perturbations in a time- and cost-effective manner. Therefore, image-based profiling data has been increasingly used for diverse biological applications, such as predicting drug mechanism of action or gene function. However, batch effects severely limit community-wide efforts to integrate and interpret image-based profiling data collected across different laboratories and equipment. To address this problem, we benchmark ten high-performing single-cell RNA sequencing (scRNA-seq) batch correction techniques, representing diverse approaches, using a newly released Cell Painting dataset, JUMP. We focus on five scenarios with varying complexity, ranging from batches prepared in a single lab over time to batches imaged using different microscopes in multiple labs. We find that Harmony and Seurat RPCA are noteworthy, consistently ranking among the top three methods for all tested scenarios while maintaining computational efficiency. Our proposed framework, benchmark, and metrics can be used to assess new batch correction methods in the future. This work paves the way for improvements that enable the community to make the best use of public Cell Painting data for scientific discovery.
高通量基于图像的分析平台是一种强大的技术,能够以经济高效的方式从数十亿个暴露于数千种干扰因素的细胞中收集数据。因此,基于图像的分析数据越来越多地被用于各种生物应用,如预测药物作用机制或基因功能。然而,批次效应严重限制了社区在整合和解释不同实验室和设备采集的基于图像的分析数据方面的努力。为了解决这个问题,我们使用新发布的 Cell Painting 数据集 JUMP,对十种性能优异的单细胞 RNA 测序 (scRNA-seq) 批次校正技术进行了基准测试,这些技术代表了不同的方法。我们关注了五个具有不同复杂度的场景,范围从一个实验室随时间制备的批次到在多个实验室使用不同显微镜成像的批次。我们发现 Harmony 和 Seurat RPCA 非常出色,在所有测试场景中始终排名前三,同时保持计算效率。我们提出的框架、基准和指标可用于未来评估新的批次校正方法。这项工作为改进铺平了道路,使社区能够充分利用公共 Cell Painting 数据进行科学发现。