Lafzi Atefeh, Kazan Hilal
Department of Health Informatics, Middle East Technical University, Ankara, Turkey.
Department of Computer Engineering, Antalya International University, Antalya, Turkey.
PLoS One. 2016 May 17;11(5):e0155354. doi: 10.1371/journal.pone.0155354. eCollection 2016.
RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation of mRNAs. Dysregulations in RBP-mediated mechanisms have been found to be associated with many steps of cancer initiation and progression. Despite this, previous studies of gene expression in cancer have ignored the effect of RBPs. To this end, we developed a lasso regression model that predicts gene expression in cancer by incorporating RBP-mediated regulation as well as the effects of other well-studied factors such as copy-number variation, DNA methylation, TFs and miRNAs. As a case study, we applied our model to Lung squamous cell carcinoma (LUSC) data as we found that there are several RBPs differentially expressed in LUSC. Including RBP-mediated regulatory effects in addition to the other features significantly increased the Spearman rank correlation between predicted and measured expression of held-out genes. Using a feature selection procedure that accounts for the adaptive search employed by lasso regularization, we identified the candidate regulators in LUSC. Remarkably, several of these candidate regulators are RBPs. Furthermore, majority of the candidate regulators have been previously found to be associated with lung cancer. To investigate the mechanisms that are controlled by these regulators, we predicted their target gene sets based on our model. We validated the target gene sets by comparing against experimentally verified targets. Our results suggest that the future studies of gene expression in cancer must consider the effect of RBP-mediated regulation.
RNA结合蛋白(RBPs)在mRNA的转录后调控中发挥关键作用。已发现RBP介导机制的失调与癌症发生和发展的多个步骤相关。尽管如此,先前关于癌症基因表达的研究忽略了RBPs的影响。为此,我们开发了一种套索回归模型,该模型通过纳入RBP介导的调控以及其他经过充分研究的因素(如拷贝数变异、DNA甲基化、转录因子和微小RNA)的影响来预测癌症中的基因表达。作为一个案例研究,我们将我们的模型应用于肺鳞状细胞癌(LUSC)数据,因为我们发现有几种RBPs在LUSC中差异表达。除其他特征外,纳入RBP介导的调控效应显著提高了预测和实测的保留基因表达之间的斯皮尔曼等级相关性。使用一种考虑套索正则化所采用的自适应搜索的特征选择程序,我们在LUSC中鉴定出候选调节因子。值得注意的是,这些候选调节因子中有几个是RBPs。此外,先前已发现大多数候选调节因子与肺癌相关。为了研究这些调节因子所控制的机制,我们基于我们的模型预测了它们的靶基因集。我们通过与实验验证的靶标进行比较来验证靶基因集。我们的结果表明,未来癌症基因表达的研究必须考虑RBP介导的调控的影响。