Duan Qiaonan, Reid St Patrick, Clark Neil R, Wang Zichen, Fernandez Nicolas F, Rouillard Andrew D, Readhead Ben, Tritsch Sarah R, Hodos Rachel, Hafner Marc, Niepel Mario, Sorger Peter K, Dudley Joel T, Bavari Sina, Panchal Rekha G, Ma'ayan Avi
Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Department of Genetics and Genomics Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
NPJ Syst Biol Appl. 2016;2:16015-. doi: 10.1038/npjsba.2016.15. Epub 2016 Aug 4.
The library of integrated network-based cellular signatures (LINCS) L1000 data set currently comprises of over a million gene expression profiles of chemically perturbed human cell lines. Through unique several intrinsic and extrinsic benchmarking schemes, we demonstrate that processing the L1000 data with the characteristic direction (CD) method significantly improves signal to noise compared with the MODZ method currently used to compute L1000 signatures. The CD processed L1000 signatures are served through a state-of-the-art web-based search engine application called L1000CDS. The L1000CDS search engine provides prioritization of thousands of small-molecule signatures, and their pairwise combinations, predicted to either mimic or reverse an input gene expression signature using two methods. The L1000CDS search engine also predicts drug targets for all the small molecules profiled by the L1000 assay that we processed. Targets are predicted by computing the cosine similarity between the L1000 small-molecule signatures and a large collection of signatures extracted from the gene expression omnibus (GEO) for single-gene perturbations in mammalian cells. We applied L1000CDS to prioritize small molecules that are predicted to reverse expression in 670 disease signatures also extracted from GEO, and prioritized small molecules that can mimic expression of 22 endogenous ligand signatures profiled by the L1000 assay. As a case study, to further demonstrate the utility of L1000CDS, we collected expression signatures from human cells infected with Ebola virus at 30, 60 and 120 min. Querying these signatures with L1000CDS we identified kenpaullone, a GSK3B/CDK2 inhibitor that we show, in subsequent experiments, has a dose-dependent efficacy in inhibiting Ebola infection without causing cellular toxicity in human cell lines. In summary, the L1000CDS tool can be applied in many biological and biomedical settings, while improving the extraction of knowledge from the LINCS L1000 resource.
基于整合网络的细胞特征库(LINCS)L1000数据集目前包含超过一百万个化学扰动的人类细胞系的基因表达谱。通过独特的多种内在和外在基准测试方案,我们证明,与目前用于计算L1000特征的MODZ方法相比,使用特征方向(CD)方法处理L1000数据可显著提高信噪比。经过CD处理的L1000特征通过一个名为L1000CDS的先进的基于网络的搜索引擎应用程序提供。L1000CDS搜索引擎使用两种方法对数千个小分子特征及其成对组合进行优先级排序,这些特征被预测可模拟或逆转输入基因表达特征。L1000CDS搜索引擎还为我们处理的L1000分析所分析的所有小分子预测药物靶点。通过计算L1000小分子特征与从基因表达综合数据库(GEO)中提取的大量用于哺乳动物细胞单基因扰动的特征之间的余弦相似度来预测靶点。我们应用L1000CDS对预测可逆转也从GEO中提取的670种疾病特征中的表达的小分子进行优先级排序,并对可模拟L1000分析所分析的22种内源性配体特征表达的小分子进行优先级排序。作为一个案例研究,为了进一步证明L1000CDS的实用性,我们收集了在30、60和120分钟时感染埃博拉病毒的人类细胞的表达特征。用L1000CDS查询这些特征,我们鉴定出了肯帕罗酮,一种GSK3B/CDK2抑制剂,在后续实验中,我们发现它在抑制埃博拉感染方面具有剂量依赖性疗效,且不会在人类细胞系中引起细胞毒性。总之,L1000CDS工具可应用于许多生物学和生物医学环境,同时改善从LINCS L1000资源中提取知识的过程。