Tang Jinyang, Wang Fei
1 Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, P. R. China.
J Bioinform Comput Biol. 2015 Dec;13(6):1542004. doi: 10.1142/S0219720015420044. Epub 2015 Oct 11.
Next-generation sequencing technologies are widely used in genome research, and RNA sequencing (RNA-Seq) is becoming the main application for gene expression profiling. A large number of computational methods have been developed for analyzing differentially expressed (DE) genes in RNA-Seq data. However, most existing algorithms prefer to call long genes as DE. Short DE genes are rarely detected. In this work, we set out to gain insight into the influence of gene length on RNA-Seq data analysis and to figure out the effect of gene length on variance estimation of RNA-Seq read counts, which is important for statistic test to identify DE genes. We proposed a balanced method of hunting for short DE genes with significance by smoothing a gene length factor. Computational experiments indicate that our method performs well. Software available: http://www.iipl.fudan.edu.cn/lenseq/.
新一代测序技术在基因组研究中被广泛应用,而RNA测序(RNA-Seq)正成为基因表达谱分析的主要应用。已经开发了大量计算方法来分析RNA-Seq数据中的差异表达(DE)基因。然而,大多数现有算法更倾向于将长基因判定为差异表达基因。短的差异表达基因很少被检测到。在这项工作中,我们着手深入了解基因长度对RNA-Seq数据分析的影响,并弄清楚基因长度对RNA-Seq读数计数方差估计的影响,这对于识别差异表达基因的统计检验很重要。我们提出了一种通过平滑基因长度因子来寻找具有显著性的短差异表达基因的平衡方法。计算实验表明我们的方法性能良好。软件获取地址:http://www.iipl.fudan.edu.cn/lenseq/ 。