Baillif Benoît, Wichard Joerg, Méndez-Lucio Oscar, Rouquié David
Bayer SAS, Bayer CropScience, Sophia Antipolis, France.
Department of Genetic Toxicology, Bayer AG, Berlin, Germany.
Front Chem. 2020 Apr 23;8:296. doi: 10.3389/fchem.2020.00296. eCollection 2020.
Pharmaceutical or phytopharmaceutical molecules rely on the interaction with one or more specific molecular targets to induce their anticipated biological responses. Nonetheless, these compounds are also prone to interact with many other non-intended biological targets, also known as off-targets. Unfortunately, off-target identification is difficult and expensive. Consequently, QSAR models predicting the activity on a target have gained importance in drug discovery or in the de-risking of chemicals. However, a restricted number of targets are well characterized and hold enough data to build such models. A good alternative to individual target evaluations is to use integrative evaluations such as transcriptomics obtained from compound-induced gene expression measurements derived from cell cultures. The advantage of these particular experiments is to capture the consequences of the interaction of compounds on many possible molecular targets and biological pathways, without having any constraints concerning the chemical space. In this work, we assessed the value of a large public dataset of compound-induced transcriptomic data, to predict compound activity on a selection of 69 molecular targets. We compared such descriptors with other QSAR descriptors, namely the Morgan fingerprints (similar to extended-connectivity fingerprints). Depending on the target, active compounds could show similar signatures in one or multiple cell lines, whether these active compounds shared similar or different chemical structures. Random forest models using gene expression signatures were able to perform similarly or better than counterpart models built with Morgan fingerprints for 25% of the target prediction tasks. These performances occurred mostly using signatures produced in cell lines showing similar signatures for active compounds toward the considered target. We show that compound-induced transcriptomic data could represent a great opportunity for target prediction, allowing to overcome the chemical space limitation of QSAR models.
药物或植物药物分子依靠与一个或多个特定分子靶点相互作用来诱导预期的生物学反应。然而,这些化合物也容易与许多其他非预期的生物学靶点相互作用,这些靶点也被称为脱靶。不幸的是,脱靶鉴定既困难又昂贵。因此,预测靶点活性的定量构效关系(QSAR)模型在药物发现或化学品风险降低方面变得越来越重要。然而,只有有限数量的靶点得到了充分表征并拥有足够的数据来构建此类模型。个体靶点评估的一个很好的替代方法是使用综合评估,例如从细胞培养物中化合物诱导的基因表达测量获得的转录组学。这些特定实验的优势在于能够捕捉化合物与许多可能的分子靶点和生物学途径相互作用的后果,而不受化学空间的任何限制。在这项工作中,我们评估了一个大型化合物诱导转录组数据公共数据集的价值,以预测化合物对69个分子靶点的活性。我们将这些描述符与其他QSAR描述符进行了比较,即摩根指纹(类似于扩展连接性指纹)。根据靶点的不同,活性化合物在一种或多种细胞系中可能表现出相似的特征,无论这些活性化合物具有相似或不同的化学结构。使用基因表达特征的随机森林模型在25%的靶点预测任务中能够表现得与使用摩根指纹构建的对应模型相似或更好。这些性能大多出现在使用对所考虑靶点的活性化合物显示相似特征的细胞系中产生的特征时。我们表明,化合物诱导的转录组数据可能为靶点预测提供一个很好的机会,从而克服QSAR模型的化学空间限制。