Yang Shengping, Mercante Donald E, Zhang Kun, Fang Zhide
Department of Pathology, School of Medicine, Texas Tech University Health Sciences Center, Lubbock, TX, USA.; Biostatistics Program, School of Public Health, LSU Health Sciences Center, New Orleans, LA, USA.
Biostatistics Program, School of Public Health, LSU Health Sciences Center, New Orleans, LA, USA.
Cancer Inform. 2016 Jun 27;15:129-41. doi: 10.4137/CIN.S39781. eCollection 2016.
DNA copy number alteration is common in many cancers. Studies have shown that insertion or deletion of DNA sequences can directly alter gene expression, and significant correlation exists between DNA copy number and gene expression. Data normalization is a critical step in the analysis of gene expression generated by RNA-seq technology. Successful normalization reduces/removes unwanted nonbiological variations in the data, while keeping meaningful information intact. However, as far as we know, no attempt has been made to adjust for the variation due to DNA copy number changes in RNA-seq data normalization.
In this article, we propose an integrated approach for RNA-seq data normalization. Comparisons show that the proposed normalization can improve power for downstream differentially expressed gene detection and generate more biologically meaningful results in gene profiling. In addition, our findings show that due to the effects of copy number changes, some housekeeping genes are not always suitable internal controls for studying gene expression.
Using information from DNA copy number, integrated approach is successful in reducing noises due to both biological and nonbiological causes in RNA-seq data, thus increasing the accuracy of gene profiling.
DNA拷贝数改变在许多癌症中很常见。研究表明,DNA序列的插入或缺失可直接改变基因表达,且DNA拷贝数与基因表达之间存在显著相关性。数据归一化是RNA测序(RNA-seq)技术产生的基因表达分析中的关键步骤。成功的归一化可减少/消除数据中不必要的非生物学变异,同时保持有意义的信息完整。然而,据我们所知,在RNA-seq数据归一化中尚未有人尝试针对DNA拷贝数变化引起的变异进行调整。
在本文中,我们提出了一种用于RNA-seq数据归一化的综合方法。比较结果表明,所提出的归一化方法可提高下游差异表达基因检测的效能,并在基因谱分析中产生更具生物学意义的结果。此外,我们的研究结果表明,由于拷贝数变化的影响,一些管家基因并不总是适合作为研究基因表达的内部对照。
利用DNA拷贝数信息,综合方法成功减少了RNA-seq数据中由生物学和非生物学原因引起的噪声,从而提高了基因谱分析的准确性。