Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.
Oncology Innovative Medicines and Early Development, AstraZeneca, Cambridge, UK.
Bioinformatics. 2018 Jan 1;34(1):72-79. doi: 10.1093/bioinformatics/btx525.
In silico approaches often fail to utilize bioactivity data available for orthologous targets due to insufficient evidence highlighting the benefit for such an approach. Deeper investigation into orthologue chemical space and its influence toward expanding compound and target coverage is necessary to improve the confidence in this practice.
Here we present analysis of the orthologue chemical space in ChEMBL and PubChem and its impact on target prediction. We highlight the number of conflicting bioactivities between human and orthologues is low and annotations are overall compatible. Chemical space analysis shows orthologues are chemically dissimilar to human with high intra-group similarity, suggesting they could effectively extend the chemical space modelled. Based on these observations, we show the benefit of orthologue inclusion in terms of novel target coverage. We also benchmarked predictive models using a time-series split and also using bioactivities from Chemistry Connect and HTS data available at AstraZeneca, showing that orthologue bioactivity inclusion statistically improved performance.
Orthologue-based bioactivity prediction and the compound training set are available at www.github.com/lhm30/PIDGINv2.
Supplementary data are available at Bioinformatics online.
由于缺乏充分的证据表明这种方法的益处,基于计算机的方法往往无法利用可用于同源目标的生物活性数据。需要更深入地研究同源物的化学空间及其对扩大化合物和靶标覆盖范围的影响,以提高对此类实践的信心。
在这里,我们展示了 ChEMBL 和 PubChem 中同源物化学空间的分析及其对靶标预测的影响。我们强调了人类和同源物之间的生物活性冲突数量较低,并且注释总体上是兼容的。化学空间分析表明,同源物与人类在化学上存在差异,与人类的相似度很高,这表明它们可以有效地扩展建模的化学空间。基于这些观察结果,我们展示了包含同源物在新型靶标覆盖方面的益处。我们还使用时间序列分割和 AstraZeneca 提供的 Chemistry Connect 和 HTS 数据中的生物活性对预测模型进行了基准测试,表明包含同源物的生物活性可在统计上提高性能。
基于同源物的生物活性预测和化合物训练集可在 www.github.com/lhm30/PIDGINv2 上获得。
补充数据可在“Bioinformatics”在线获得。