Cho Hyein, No Kyoung Tai, Lim Hocheol
The Interdisciplinary Graduate Program in Integrative Biotechnology & Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea.
Bioinformatics and Molecular Design Research Center (BMDRC), Incheon 21983, Republic of Korea.
Int J Mol Sci. 2024 Dec 30;26(1):224. doi: 10.3390/ijms26010224.
Understanding drug-target interactions is crucial for identifying novel lead compounds, enhancing efficacy, and reducing toxicity. Phenotype-based approaches, like analyzing drug-induced gene expression changes, have shown effectiveness in drug discovery and precision medicine. However, experimentally determining gene expression for all relevant chemicals is impractical, limiting large-scale gene expression-based screening. In this study, we developed DIGERA (Drug-Induced Gene Expression Ranking Analysis), a Lasso-based ensemble framework utilizing LINCS L1000 data to predict drug-induced gene expression rankings. We created novel numerical features for chemicals, cell lines, and experimental conditions, allowing the prediction of gene expression rankings across eight key cell lines. DIGERA outperformed baseline models in the F1@K metric, demonstrating improved precision in gene expression ranking. We also combined DIGERA with an iterative fine-tuning process for de novo design, suggesting 10 PARP1 inhibitors with favorable predicted properties like binding affinity, synthetic accessibility, solubility, membrane permeability, drug-likeness, and similar gene expression ranking to olaparib. Notably, nine compounds were novel, and six analogs of these compounds had references linked to PARP1 inhibition. These results underscore DIGERA's potential to boost model performance and robustness through novel features and ensemble learning, aiding virtual screening for new PARP1 inhibitors.
了解药物与靶点的相互作用对于识别新型先导化合物、提高疗效和降低毒性至关重要。基于表型的方法,如分析药物诱导的基因表达变化,已在药物发现和精准医学中显示出有效性。然而,通过实验确定所有相关化学物质的基因表达是不切实际的,这限制了基于大规模基因表达的筛选。在本研究中,我们开发了DIGERA(药物诱导基因表达排名分析),这是一个基于套索回归的集成框架,利用LINCS L1000数据预测药物诱导的基因表达排名。我们为化学物质、细胞系和实验条件创建了新的数值特征,从而能够预测八个关键细胞系中的基因表达排名。DIGERA在F1@K指标上优于基线模型,证明在基因表达排名方面具有更高的精度。我们还将DIGERA与用于从头设计的迭代微调过程相结合,提出了10种PARP1抑制剂,它们具有良好的预测特性,如结合亲和力、合成可及性、溶解度、膜通透性、类药性,以及与奥拉帕利相似的基因表达排名。值得注意的是,其中9种化合物是新的,这些化合物的6种类似物有与PARP1抑制相关的参考文献。这些结果强调了DIGERA通过新特征和集成学习提高模型性能和稳健性的潜力,有助于虚拟筛选新的PARP1抑制剂。