Shi Xingjie, Zhao Qing, Huang Jian, Xie Yang, Ma Shuangge
Department of Statistics, Nanjing University of Finance and Economics, Nanjing, China, School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China.
Department of Biostatistics, Yale University, New Haven, CT, USA.
Bioinformatics. 2015 Dec 15;31(24):3977-83. doi: 10.1093/bioinformatics/btv518. Epub 2015 Sep 3.
Both gene expression levels (GEs) and copy number alterations (CNAs) have important biological implications. GEs are partly regulated by CNAs, and much effort has been devoted to understanding their relations. The regulation analysis is challenging with one gene expression possibly regulated by multiple CNAs and one CNA potentially regulating the expressions of multiple genes. The correlations among GEs and among CNAs make the analysis even more complicated. The existing methods have limitations and cannot comprehensively describe the regulation.
A sparse double Laplacian shrinkage method is developed. It jointly models the effects of multiple CNAs on multiple GEs. Penalization is adopted to achieve sparsity and identify the regulation relationships. Network adjacency is computed to describe the interconnections among GEs and among CNAs. Two Laplacian shrinkage penalties are imposed to accommodate the network adjacency measures. Simulation shows that the proposed method outperforms the competing alternatives with more accurate marker identification. The Cancer Genome Atlas data are analysed to further demonstrate advantages of the proposed method.
R code is available at http://works.bepress.com/shuangge/49/.
基因表达水平(GEs)和拷贝数变异(CNAs)都具有重要的生物学意义。GEs部分受CNAs调控,人们已投入大量精力来理解它们之间的关系。由于一个基因表达可能受多个CNAs调控,且一个CNA可能潜在地调控多个基因的表达,因此调控分析具有挑战性。GEs之间以及CNAs之间的相关性使得分析更加复杂。现有方法存在局限性,无法全面描述这种调控关系。
开发了一种稀疏双拉普拉斯收缩方法。它联合建模多个CNAs对多个GEs的影响。采用惩罚来实现稀疏性并识别调控关系。计算网络邻接性以描述GEs之间以及CNAs之间的相互连接。施加两种拉普拉斯收缩惩罚以适应网络邻接性度量。模拟表明,所提出的方法在标记识别方面比竞争方法更准确,性能更优。对癌症基因组图谱数据进行分析以进一步证明所提出方法的优势。