Pham Thai-Hoang, Qiu Yue, Zeng Jucheng, Xie Lei, Zhang Ping
Department of Computer Science and Engineering, The Ohio State University, Columbus, 43210, USA.
Ph.D. Program in Biology, The Graduate Center, The City University of New York, New York, 10016, USA.
Nat Mach Intell. 2021 Mar;3(3):247-257. doi: 10.1038/s42256-020-00285-9. Epub 2021 Feb 1.
Phenotype-based compound screening has advantages over target-based drug discovery, but is unscalable and lacks understanding of mechanism. Chemical-induced gene expression profile provides a mechanistic signature of phenotypic response. However, the use of such data is limited by their sparseness, unreliability, and relatively low throughput. Few methods can perform phenotype-based chemical compound screening. Here, we propose a mechanism-driven neural network-based method DeepCE, which utilizes graph neural network and multi-head attention mechanism to model chemical substructure-gene and gene-gene associations, for predicting the differential gene expression profile perturbed by chemicals. Moreover, we propose a novel data augmentation method which extracts useful information from unreliable experiments in L1000 dataset. The experimental results show that DeepCE achieves superior performances to state-of-the-art methods. The effectiveness of gene expression profiles generated from DeepCE is further supported by comparing them with observed data for downstream classification tasks. To demonstrate the value of DeepCE, we apply it to drug repurposing of COVID-19, and generate novel lead compounds consistent with clinical evidence. Thus, DeepCE provides a potentially powerful framework for robust predictive modeling by utilizing noisy omics data and screening novel chemicals for the modulation of a systemic response to disease.
基于表型的化合物筛选比基于靶点的药物发现具有优势,但不可扩展且缺乏对机制的理解。化学诱导的基因表达谱提供了表型反应的机制特征。然而,此类数据的使用受到其稀疏性、不可靠性和相对低通量的限制。很少有方法能够进行基于表型的化合物筛选。在此,我们提出一种基于机制驱动的神经网络方法DeepCE,该方法利用图神经网络和多头注意力机制对化学亚结构-基因和基因-基因关联进行建模,以预测受化学物质干扰的差异基因表达谱。此外,我们提出了一种新颖的数据增强方法,该方法从L1000数据集中不可靠的实验中提取有用信息。实验结果表明,DeepCE的性能优于现有方法。通过将DeepCE生成的基因表达谱与下游分类任务的观测数据进行比较,进一步支持了其有效性。为了证明DeepCE的价值,我们将其应用于COVID-19的药物再利用,并生成了与临床证据一致的新型先导化合物。因此,DeepCE通过利用有噪声的组学数据和筛选用于调节对疾病的全身反应的新型化学物质,为稳健的预测建模提供了一个潜在强大的框架。