Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, College of Life Science, East China Normal University, Shanghai, China.
PLoS One. 2011;6(11):e28318. doi: 10.1371/journal.pone.0028318. Epub 2011 Nov 30.
In their expression process, different genes can generate diverse functional products, including various protein-coding or noncoding RNAs. Here, we investigated the protein-coding capacities and the expression levels of their isoforms for human known genes, the conservation and disease association of long noncoding RNAs (ncRNAs) with two transcriptome sequencing datasets from human brain tissues and 10 mixed cell lines. Comparative analysis revealed that about two-thirds of the genes expressed between brain and cell lines are the same, but less than one-third of their isoforms are identical. Besides those genes specially expressed in brain and cell lines, about 66% of genes expressed in common encoded different isoforms. Moreover, most genes dominantly expressed one isoform and some genes only generated protein-coding (or noncoding) RNAs in one sample but not in another. We found 282 human genes could encode both protein-coding and noncoding RNAs through alternative splicing in the two samples. We also identified more than 1,000 long ncRNAs, and most of those long ncRNAs contain conserved elements across either 46 vertebrates or 33 placental mammals or 10 primates. Further analysis showed that some long ncRNAs differentially expressed in human breast cancer or lung cancer, several of those differentially expressed long ncRNAs were validated by RT-PCR. In addition, those validated differentially expressed long ncRNAs were found significantly correlated with certain breast cancer or lung cancer related genes, indicating the important biological relevance between long ncRNAs and human cancers. Our findings reveal that the differences of gene expression profile between samples mainly result from the expressed gene isoforms, and highlight the importance of studying genes at the isoform level for completely illustrating the intricate transcriptome.
在表达过程中,不同的基因可以产生不同的功能产物,包括各种蛋白质编码或非编码 RNA。在这里,我们使用来自人类脑组织的两个转录组测序数据集和 10 种混合细胞系,研究了人类已知基因的蛋白质编码能力及其异构体的表达水平、长非编码 RNA(ncRNA)的保守性和与疾病的关联性。比较分析表明,在脑组织和细胞系之间表达的基因约有三分之二是相同的,但它们的异构体不到三分之一是相同的。除了那些在脑组织和细胞系中特异表达的基因外,大约 66%的共同表达基因编码不同的异构体。此外,大多数基因主要表达一种异构体,而有些基因在一种样本中只产生蛋白质编码(或非编码)RNA,而在另一种样本中则不产生。我们发现,在这两个样本中,有 282 个人类基因可以通过选择性剪接编码蛋白质编码和非编码 RNA。我们还鉴定了 1000 多个长 ncRNA,其中大多数长 ncRNA在 46 种脊椎动物或 33 种胎盘哺乳动物或 10 种灵长类动物中都含有保守元件。进一步的分析表明,一些长 ncRNA在人类乳腺癌或肺癌中差异表达,其中一些差异表达的长 ncRNA通过 RT-PCR 进行了验证。此外,那些经过验证的差异表达长 ncRNA与某些乳腺癌或肺癌相关基因显著相关,表明长 ncRNA 与人类癌症之间存在重要的生物学相关性。我们的研究结果表明,样本之间基因表达谱的差异主要是由于表达基因的异构体造成的,强调了在异构体水平上研究基因对于完全阐明复杂的转录组的重要性。