Bao Feng, Li Li, Hammerlindl Heinz, Shen Susan Q, Hammerlindl Sabrina, Altschuler Steven J, Wu Lani F
Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA.
Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, USA.
Nat Biotechnol. 2025 Jul 11. doi: 10.1038/s41587-025-02729-2.
High-content image-based phenotypic screens (HCSs) provide a scalable approach to characterize biological functions of compounds. The widespread adoption of HCS has led to a growing body of available profile datasets. However, study-specific experimental and computational choices lead to profile datasets that cannot be directly combined. A critical, long-standing challenge is how to integrate these rich but currently isolated HCS dataset resources. Here we introduce a contrastive, deep-learning framework that leverages sparse sets of overlapping profiles as fiducials to align heterogeneous HCS profile datasets in a shared latent space. We demonstrate that this alignment facilitates accurate 'transitive' predictions, whereby the function of an uncharacterized compound screened in one dataset can be predicted through comparison with characterized compounds already profiled in other datasets. In silico alignment of HCS resources provides a path to unify fast-growing HCS resources and accelerate early drug discovery efforts.
基于高内涵图像的表型筛选(HCS)提供了一种可扩展的方法来表征化合物的生物学功能。HCS的广泛采用导致了可用的特征数据集不断增加。然而,特定研究的实验和计算选择导致了无法直接合并的特征数据集。一个关键的、长期存在的挑战是如何整合这些丰富但目前孤立的HCS数据集资源。在这里,我们引入了一个对比性的深度学习框架,该框架利用稀疏的重叠特征集作为基准,在共享的潜在空间中对齐异构HCS特征数据集。我们证明,这种对齐有助于进行准确的“传递性”预测,即通过与其他数据集中已表征的化合物进行比较,可以预测在一个数据集中筛选的未表征化合物的功能。HCS资源的计算机模拟对齐为统一快速增长的HCS资源和加速早期药物发现工作提供了一条途径。