Sahakyan Aleksandr B, Balasubramanian Shankar
Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.
Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
BMC Genomics. 2016 Mar 12;17:225. doi: 10.1186/s12864-016-2582-9.
The role of random mutations and genetic errors in defining the etiology of cancer and other multigenic diseases has recently received much attention. With the view that complex genes should be particularly vulnerable to such events, here we explore the link between the simple properties of the human genes, such as transcript length, number of splice variants, exon/intron composition, and their involvement in the pathways linked to cancer and other multigenic diseases.
We reveal a substantial enrichment of cancer pathways with long genes and genes that have multiple splice variants. Although the latter two factors are interdependent, we show that the overall gene length and splicing complexity increase in cancer pathways in a partially decoupled manner. Our systematic survey for the pathways enriched with top lengthy genes and with genes that have multiple splice variants reveal, along with cancer pathways, the pathways involved in various neuronal processes, cardiomyopathies and type II diabetes. We outline a correlation between the gene length and the number of somatic mutations.
Our work is a step forward in the assessment of the role of simple gene characteristics in cancer and a wider range of multigenic diseases. We demonstrate a significant accumulation of long genes and genes with multiple splice variants in pathways of multigenic diseases that have already been associated with de novo mutations. Unlike the cancer pathways, we note that the pathways of neuronal processes, cardiomyopathies and type II diabetes contain genes long enough for topoisomerase-dependent gene expression to also be a potential contributing factor in the emergence of pathologies, should topoisomerases become impaired.
随机突变和基因错误在确定癌症及其他多基因疾病病因中的作用近来备受关注。鉴于复杂基因可能特别容易发生此类事件,我们在此探讨人类基因的简单特性(如转录本长度、剪接变体数量、外显子/内含子组成)与它们在与癌症及其他多基因疾病相关途径中的参与情况之间的联系。
我们发现癌症途径中长基因和具有多个剪接变体的基因显著富集。尽管后两个因素相互依存,但我们表明癌症途径中基因的总体长度和剪接复杂性以部分解耦的方式增加。我们对富含最长基因和具有多个剪接变体的基因的途径进行的系统调查表明,除癌症途径外,还涉及各种神经元过程、心肌病和II型糖尿病相关的途径。我们概述了基因长度与体细胞突变数量之间的相关性。
我们的工作在评估简单基因特征在癌症及更广泛的多基因疾病中的作用方面向前迈进了一步。我们证明在已与新生突变相关的多基因疾病途径中,长基因和具有多个剪接变体的基因显著积累。与癌症途径不同,我们注意到神经元过程、心肌病和II型糖尿病的途径包含足够长的基因,以至于如果拓扑异构酶受损,拓扑异构酶依赖性基因表达也可能是病理出现的潜在促成因素。