Iyer Matthew K, Niknafs Yashar S, Malik Rohit, Singhal Udit, Sahu Anirban, Hosono Yasuyuki, Barrette Terrence R, Prensner John R, Evans Joseph R, Zhao Shuang, Poliakov Anton, Cao Xuhong, Dhanasekaran Saravana M, Wu Yi-Mi, Robinson Dan R, Beer David G, Feng Felix Y, Iyer Hariharan K, Chinnaiyan Arul M
1] Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, Michigan, USA. [2] Department of Computational Medicine and Bioinformatics, Ann Arbor, Michigan, USA.
1] Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, Michigan, USA. [2] Department of Cellular and Molecular Biology, University of Michigan, Ann Arbor, Michigan, USA.
Nat Genet. 2015 Mar;47(3):199-208. doi: 10.1038/ng.3192. Epub 2015 Jan 19.
Long noncoding RNAs (lncRNAs) are emerging as important regulators of tissue physiology and disease processes including cancer. To delineate genome-wide lncRNA expression, we curated 7,256 RNA sequencing (RNA-seq) libraries from tumors, normal tissues and cell lines comprising over 43 Tb of sequence from 25 independent studies. We applied ab initio assembly methodology to this data set, yielding a consensus human transcriptome of 91,013 expressed genes. Over 68% (58,648) of genes were classified as lncRNAs, of which 79% were previously unannotated. About 1% (597) of the lncRNAs harbored ultraconserved elements, and 7% (3,900) overlapped disease-associated SNPs. To prioritize lineage-specific, disease-associated lncRNA expression, we employed non-parametric differential expression testing and nominated 7,942 lineage- or cancer-associated lncRNA genes. The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.
长链非编码RNA(lncRNA)正成为包括癌症在内的组织生理学和疾病进程的重要调节因子。为了描绘全基因组lncRNA的表达情况,我们整理了来自肿瘤、正常组织和细胞系的7256个RNA测序(RNA-seq)文库,这些文库包含来自25项独立研究的超过43太字节的序列。我们将从头组装方法应用于该数据集,得到了一个包含91013个表达基因的人类转录组共识。超过68%(58648个)的基因被归类为lncRNA,其中79%以前未被注释。约1%(597个)的lncRNA含有超保守元件,7%(3900个)与疾病相关的单核苷酸多态性(SNP)重叠。为了对特定谱系、与疾病相关的lncRNA表达进行优先级排序,我们采用了非参数差异表达测试,并确定了7942个与谱系或癌症相关的lncRNA基因。这里所描绘的lncRNA图谱可能有助于揭示正常生物学和癌症发病机制,并且可能对未来生物标志物的开发具有重要价值。