Aliper Alexander, Plis Sergey, Artemov Artem, Ulloa Alvaro, Mamoshina Polina, Zhavoronkov Alex
Insilico Medicine, ETC, B301, Johns Hopkins University , Baltimore, Maryland 21218, United States.
Datalytic Solutions , 1101 Yale Boulevard NE, Albuquerque, New Mexico 87106, United States.
Mol Pharm. 2016 Jul 5;13(7):2524-30. doi: 10.1021/acs.molpharmaceut.6b00248. Epub 2016 Jun 8.
Deep learning is rapidly advancing many areas of science and technology with multiple success stories in image, text, voice and video recognition, robotics, and autonomous driving. In this paper we demonstrate how deep neural networks (DNN) trained on large transcriptional response data sets can classify various drugs to therapeutic categories solely based on their transcriptional profiles. We used the perturbation samples of 678 drugs across A549, MCF-7, and PC-3 cell lines from the LINCS Project and linked those to 12 therapeutic use categories derived from MeSH. To train the DNN, we utilized both gene level transcriptomic data and transcriptomic data processed using a pathway activation scoring algorithm, for a pooled data set of samples perturbed with different concentrations of the drug for 6 and 24 hours. In both pathway and gene level classification, DNN achieved high classification accuracy and convincingly outperformed the support vector machine (SVM) model on every multiclass classification problem, however, models based on pathway level data performed significantly better. For the first time we demonstrate a deep learning neural net trained on transcriptomic data to recognize pharmacological properties of multiple drugs across different biological systems and conditions. We also propose using deep neural net confusion matrices for drug repositioning. This work is a proof of principle for applying deep learning to drug discovery and development.
深度学习正在迅速推动科学技术的许多领域发展,在图像、文本、语音和视频识别、机器人技术及自动驾驶等方面都有诸多成功案例。在本文中,我们展示了在大型转录反应数据集上训练的深度神经网络(DNN)如何仅根据各种药物的转录谱将其分类到治疗类别。我们使用了来自LINCS项目的A549、MCF - 7和PC - 3细胞系中678种药物的扰动样本,并将这些样本与从医学主题词表(MeSH)衍生出的12个治疗用途类别相关联。为了训练DNN,我们利用了基因水平的转录组数据以及使用通路激活评分算法处理的转录组数据,用于不同浓度药物扰动6小时和24小时的样本汇总数据集。在通路和基因水平分类中,DNN都实现了高分类准确率,并且在每个多类分类问题上都令人信服地优于支持向量机(SVM)模型,然而,基于通路水平数据的模型表现明显更好。我们首次展示了一个在转录组数据上训练的深度学习神经网络,用于识别跨不同生物系统和条件的多种药物的药理特性。我们还提出使用深度神经网络混淆矩阵进行药物重新定位。这项工作是将深度学习应用于药物发现和开发的原理验证。