Liu Ying, Irie Takuma, Yada Tetsushi, Suzuki Yutaka
Department of Computational Biology and Medical Science, Graduate School of Frontier Sciences, the University of Tokyo, Chiba, Japan.
Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka, Japan.
Nucleic Acids Res. 2017 Jul 27;45(13):e124. doi: 10.1093/nar/gkx396.
In recent years, the dramatic increase in the number of applications for massively parallel reporter assay (MPRA) technology has produced a large body of data for various purposes. However, a computational model that can be applied to decipher regulatory codes for diverse MPRAs does not exist yet. Here, we propose a new computational method to predict the transcriptional activity of MPRAs, as well as luciferase reporter assays, based on the TRANScription FACtor database. We employed regression trees and multivariate adaptive regression splines to obtain these predictions and considered a feature redundancy-dependent formula for conventional regression trees to enable adaptation to diverse data. The developed method was applicable to various MPRAs despite the use of different types of transfected cells, sequence lengths, construct numbers and sequence types. We demonstrate that this method can predict the transcriptional activity of promoters in HEK293 cells through predictive functions that were estimated by independent assays in eight tumor cell lines. The prediction was generally good (Pearson's r = 0.68) which suggested that common active transcription factor binding sites across different cell types make greater contributions to transcriptional activity and that known promoter activity could confer transcriptional activity of unknown promoters in some instances, regardless of cell type.
近年来,大规模平行报告基因分析(MPRA)技术的应用数量急剧增加,产生了大量用于各种目的的数据。然而,目前尚不存在一种可用于解读各种MPRA调控密码的计算模型。在此,我们基于转录因子数据库提出了一种新的计算方法,用于预测MPRA以及荧光素酶报告基因分析的转录活性。我们采用回归树和多元自适应回归样条来获得这些预测结果,并考虑了传统回归树的特征冗余依赖公式,以使模型能够适应不同的数据。尽管使用了不同类型的转染细胞、序列长度、构建体数量和序列类型,但所开发的方法适用于各种MPRA。我们证明,该方法可以通过在八种肿瘤细胞系中进行独立分析估计出的预测函数,来预测HEK293细胞中启动子的转录活性。预测结果总体良好(皮尔逊相关系数r = 0.68),这表明不同细胞类型中常见的活性转录因子结合位点对转录活性的贡献更大,并且在某些情况下,已知启动子活性可以赋予未知启动子转录活性,而与细胞类型无关。