Broad Institute of MIT and Harvard, Cambridge, USA.
Biological Research Centre, Szeged, Hungary.
Nat Commun. 2023 Apr 8;14(1):1967. doi: 10.1038/s41467-023-37570-1.
Predicting assay results for compounds virtually using chemical structures and phenotypic profiles has the potential to reduce the time and resources of screens for drug discovery. Here, we evaluate the relative strength of three high-throughput data sources-chemical structures, imaging (Cell Painting), and gene-expression profiles (L1000)-to predict compound bioactivity using a historical collection of 16,170 compounds tested in 270 assays for a total of 585,439 readouts. All three data modalities can predict compound activity for 6-10% of assays, and in combination they predict 21% of assays with high accuracy, which is a 2 to 3 times higher success rate than using a single modality alone. In practice, the accuracy of predictors could be lower and still be useful, increasing the assays that can be predicted from 37% with chemical structures alone up to 64% when combined with phenotypic data. Our study shows that unbiased phenotypic profiling can be leveraged to enhance compound bioactivity prediction to accelerate the early stages of the drug-discovery process.
利用化学结构和表型谱虚拟预测化合物的检测结果,有望减少药物发现筛选的时间和资源。在这里,我们评估了三种高通量数据源(化学结构、成像(细胞染色)和基因表达谱(L1000))的相对强度,以使用历史上的 16170 种化合物进行 270 种检测的集合来预测化合物的生物活性,总共产生了 585439 个读数。所有三种数据模式都可以预测 6-10%的检测结果,而组合使用可以以高精度预测 21%的检测结果,成功率比单独使用单一模式高 2 到 3 倍。在实践中,预测器的准确性可能较低,但仍然有用,将可以预测的检测结果从单独使用化学结构的 37%增加到与表型数据结合使用的 64%。我们的研究表明,可以利用无偏表型分析来增强化合物生物活性预测,从而加速药物发现过程的早期阶段。