Borowa Adriana, Rymarczyk Dawid, Żyła Marek, Kańdula Maciej, Sánchez-Fernández Ana, Rataj Krzysztof, Struski Łukasz, Tabor Jacek, Zieliński Bartosz
Jagiellonian University, Faculty of Mathematics and Computer Science, Kraków, Poland.
Jagiellonian University, Doctoral School of Exact and Natural Sciences, Kraków, Poland.
Comput Struct Biotechnol J. 2024 Mar 12;23:1181-1188. doi: 10.1016/j.csbj.2024.02.022. eCollection 2024 Dec.
Biomedical imaging techniques such as high content screening (HCS) are valuable for drug discovery, but high costs limit their use to pharmaceutical companies. To address this issue, The JUMP-CP consortium released a massive open image dataset of chemical and genetic perturbations, providing a valuable resource for deep learning research. In this work, we aim to utilize the JUMP-CP dataset to develop a universal representation model for HCS data, mainly data generated using U2OS cells and CellPainting protocol, using supervised and self-supervised learning approaches. We propose an evaluation protocol that assesses their performance on mode of action and property prediction tasks using a popular phenotypic screening dataset. Results show that the self-supervised approach that uses data from multiple consortium partners provides representation that is more robust to batch effects whilst simultaneously achieving performance on par with standard approaches. Together with other conclusions, it provides recommendations on the training strategy of a representation model for HCS images.
诸如高内涵筛选(HCS)之类的生物医学成像技术对药物发现很有价值,但成本高昂,限制了其仅能被制药公司使用。为解决这一问题,JUMP-CP联盟发布了一个包含化学和基因扰动的大规模开放图像数据集,为深度学习研究提供了宝贵资源。在这项工作中,我们旨在利用JUMP-CP数据集为HCS数据开发一个通用表示模型,主要是使用U2OS细胞和细胞绘画协议生成的数据,采用监督学习和自监督学习方法。我们提出了一个评估协议,使用一个流行的表型筛选数据集评估它们在作用模式和性质预测任务上的性能。结果表明,使用来自多个联盟伙伴数据的自监督方法提供的表示对批次效应更具鲁棒性,同时在性能上与标准方法相当。连同其他结论,它为HCS图像表示模型的训练策略提供了建议。