Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China.
Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China.
J Transl Med. 2017 Sep 29;15(1):198. doi: 10.1186/s12967-017-1302-9.
The Connectivity Map (CMAP) database, an important public data source for drug repositioning, archives gene expression profiles from cancer cell lines treated with and without bioactive small molecules. However, there are only one or two technical replicates for each cell line under one treatment condition. For such small-scale data, current fold-changes-based methods lack statistical control in identifying differentially expressed genes (DEGs) in treated cells. Especially, one-to-one comparison may result in too many drug-irrelevant DEGs due to random experimental factors. To tackle this problem, CMAP adopts a pattern-matching strategy to build "connection" between disease signatures and gene expression changes associated with drug treatments. However, many drug-irrelevant genes may blur the "connection" if all the genes are used instead of pre-selected DEGs induced by drug treatments.
We applied OneComp, a customized version of RankComp, to identify DEGs in such small-scale cell line datasets. For a cell line, a list of gene pairs with stable relative expression orderings (REOs) were identified in a large collection of control cell samples measured in different experiments and they formed the background stable REOs. When applying OneComp to a small-scale cell line dataset, the background stable REOs were customized by filtering out the gene pairs with reversal REOs in the control samples of the analyzed dataset.
In simulated data, the consistency scores of overlapping genes between DEGs identified by OneComp and SAM were all higher than 99%, while the consistency score of the DEGs solely identified by OneComp was 96.85% according to the observed expression difference method. The usefulness of OneComp was exemplified in drug repositioning by identifying phenformin and metformin related genes using small-scale cell line datasets which helped to support them as a potential anti-tumor drug for non-small-cell lung carcinoma, while the pattern-matching strategy adopted by CMAP missed the two connections. The implementation of OneComp is available at https://github.com/pathint/reoa .
OneComp performed well in both the simulated and real data. It is useful in drug repositioning studies by helping to find hidden "connections" between drugs and diseases.
Connectivity Map(CMAP)数据库是药物重定位的重要公共数据源,它归档了经生物活性小分子处理和未经处理的癌细胞系的基因表达谱。然而,每种处理条件下的每个细胞系只有一个或两个技术重复。对于这种小规模数据,基于折叠变化的当前方法在识别处理细胞中的差异表达基因(DEG)方面缺乏统计控制。特别是,一对一比较可能会由于随机实验因素而导致太多与药物无关的 DEG。为了解决这个问题,CMAP 采用模式匹配策略在疾病特征与药物治疗相关的基因表达变化之间建立“连接”。然而,如果使用所有基因而不是药物处理诱导的预先选择的 DEG,则许多与药物无关的基因可能会使“连接”变得模糊。
我们应用了定制版本的 RankComp 即 OneComp,来识别这种小规模细胞系数据集中的 DEG。对于一个细胞系,在大量不同实验测量的对照细胞样本中,确定具有稳定相对表达顺序(REO)的基因对列表,并形成背景稳定 REO。当将 OneComp 应用于小规模细胞系数据集时,通过过滤分析数据集对照样本中具有逆转 REO 的基因对来定制背景稳定 REO。
在模拟数据中,OneComp 识别的 DEG 与 SAM 之间重叠基因的一致性评分均高于 99%,而根据观察到的表达差异方法,仅由 OneComp 识别的 DEG 的一致性评分为 96.85%。OneComp 的有用性在药物重定位中得到了例证,即用小规模细胞系数据集识别二甲双胍和二甲双胍相关基因,这有助于支持它们作为非小细胞肺癌的潜在抗肿瘤药物,而 CMAP 采用的模式匹配策略则错过了这两个联系。OneComp 的实现可在 https://github.com/pathint/reoa 上获得。
OneComp 在模拟和真实数据中表现良好。它在药物重定位研究中很有用,有助于发现药物和疾病之间隐藏的“联系”。