Yang Yang, Yang Yu-Cheng T, Yuan Jiapei, Lu Zhi John, Li Jingyi Jessica
PKU-Tsinghua-NIBS Graduate Program, School of Life Sciences, Peking University, Beijing 100871, China.
MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China.
Nucleic Acids Res. 2017 Feb 28;45(4):1657-1672. doi: 10.1093/nar/gkw1256.
Distinguishing cell states based only on gene expression data remains a challenging task. This is true even for analyses within a species. In cross-species comparisons, the results obtained by different groups have varied widely. Here, we integrate RNA-seq data from more than 40 cell and tissue types of four mammalian species to identify sets of associated genes as indicators for specific cell states in each species. We employ a statistical method, TROM, to identify both protein-coding and non-coding indicators. Next, we map the cell states within each species and also between species using these indicator genes. We recapitulate known phenotypic similarity between related cell and tissue types and reveal molecular basis for their similarity. We also report novel associations between several tissues and cell types with functional support. Moreover, our identified conserved associated genes are found to be a good resource for studying cell differentiation and reprogramming. Lastly, long non-coding RNAs can serve well as associated genes to indicate cell states. We further infer the biological functions of those non-coding associated genes based on their co-expressed protein-coding genes. This study demonstrates that combining statistical modeling with public RNA-seq data can be powerful for improving our understanding of cell identity control.
仅基于基因表达数据来区分细胞状态仍然是一项具有挑战性的任务。即使在同一物种内进行分析,情况也是如此。在跨物种比较中,不同研究团队得到的结果差异很大。在这里,我们整合了来自四种哺乳动物的40多种细胞和组织类型的RNA测序数据,以识别相关基因集,作为每种物种中特定细胞状态的指标。我们采用一种统计方法——TROM,来识别蛋白质编码和非编码指标。接下来,我们利用这些指标基因来描绘每个物种内部以及物种之间的细胞状态。我们概括了相关细胞和组织类型之间已知的表型相似性,并揭示了它们相似性的分子基础。我们还报告了几种组织和细胞类型之间具有功能支持的新关联。此外,我们发现所识别的保守相关基因是研究细胞分化和重编程的良好资源。最后,长链非编码RNA可以很好地作为指示细胞状态的相关基因。我们基于共表达的蛋白质编码基因进一步推断这些非编码相关基因的生物学功能。这项研究表明,将统计建模与公共RNA测序数据相结合,对于增进我们对细胞身份控制的理解可能会很有成效。