Zhang Shi-Jian, Wang Chenqu, Yan Shouyu, Fu Aisi, Luan Xuke, Li Yumei, Sunny Shen Qing, Zhong Xiaoming, Chen Jia-Yu, Wang Xiangfeng, Chin-Ming Tan Bertrand, He Aibin, Li Chuan-Yun
Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China.
Department of Crop Genomics and Bioinformatics, College of Agronomy and Biotechnology, China Agricultural University, Beijing, China.
Mol Biol Evol. 2017 Oct 1;34(10):2453-2468. doi: 10.1093/molbev/msx212.
Recent RNA-seq technology revealed thousands of splicing events that are under rapid evolution in primates, whereas the reliability of these events, as well as their combination on the isoform level, have not been adequately addressed due to its limited sequencing length. Here, we performed comparative transcriptome analyses in human and rhesus macaque cerebellum using single molecule long-read sequencing (Iso-seq) and matched RNA-seq. Besides 359 million RNA-seq reads, 4,165,527 Iso-seq reads were generated with a mean length of 14,875 bp, covering 11,466 human genes, and 10,159 macaque genes. With Iso-seq data, we substantially expanded the repertoire of alternative RNA processing events in primates, and found that intron retention and alternative polyadenylation are surprisingly more prevalent in primates than previously estimated. We then investigated the combinatorial mode of these alternative events at the whole-transcript level, and found that the combination of these events is largely independent along the transcript, leading to thousands of novel isoforms missed by current annotations. Notably, these novel isoforms are selectively constrained in general, and 1,119 isoforms have even higher expression than the previously annotated major isoforms in human, indicating that the complexity of the human transcriptome is still significantly underestimated. Comparative transcriptome analysis further revealed 502 genes encoding selectively constrained, lineage-specific isoforms in human but not in rhesus macaque, linking them to some lineage-specific functions. Overall, we propose that the independent combination of alternative RNA processing events has contributed to complex isoform evolution in primates, which provides a new foundation for the study of phenotypic difference among primates.
最近的RNA测序技术揭示了灵长类动物中数千种正在快速进化的剪接事件,然而由于测序长度有限,这些事件的可靠性以及它们在异构体水平上的组合尚未得到充分研究。在这里,我们使用单分子长读测序(Iso-seq)和匹配的RNA测序对人类和恒河猴小脑进行了比较转录组分析。除了3.59亿条RNA测序读数外,还生成了4165527条Iso-seq读数,平均长度为14875 bp,覆盖了11466个人类基因和10159个猕猴基因。利用Iso-seq数据,我们大幅扩展了灵长类动物中可变RNA加工事件的种类,并发现内含子保留和可变聚腺苷酸化在灵长类动物中的普遍性出人意料地高于先前的估计。然后,我们在全转录本水平上研究了这些可变事件的组合模式,发现这些事件的组合在转录本上基本是独立的,导致了数千种当前注释遗漏的新型异构体。值得注意的是,这些新型异构体总体上受到选择性限制,其中1119种异构体在人类中的表达甚至高于先前注释的主要异构体,这表明人类转录组的复杂性仍被严重低估。比较转录组分析进一步揭示了502个基因,这些基因在人类中编码选择性受限的、谱系特异性的异构体,而在恒河猴中则没有,这将它们与一些谱系特异性功能联系起来。总体而言,我们认为可变RNA加工事件的独立组合促成了灵长类动物中复杂的异构体进化,这为研究灵长类动物之间的表型差异提供了新的基础。