Department of Pharmacology and Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Nat Genet. 2018 Oct;50(10):1474-1482. doi: 10.1038/s41588-018-0207-8. Epub 2018 Sep 17.
The functions of most long non-coding RNAs (lncRNAs) are unknown. In contrast to proteins, lncRNAs with similar functions often lack linear sequence homology; thus, the identification of function in one lncRNA rarely informs the identification of function in others. We developed a sequence comparison method to deconstruct linear sequence relationships in lncRNAs and evaluate similarity based on the abundance of short motifs called k-mers. We found that lncRNAs of related function often had similar k-mer profiles despite lacking linear homology, and that k-mer profiles correlated with protein binding to lncRNAs and with their subcellular localization. Using a novel assay to quantify Xist-like regulatory potential, we directly demonstrated that evolutionarily unrelated lncRNAs can encode similar function through different spatial arrangements of related sequence motifs. K-mer-based classification is a powerful approach to detect recurrent relationships between sequence and function in lncRNAs.
大多数长非编码 RNA(lncRNA)的功能是未知的。与蛋白质不同,具有相似功能的 lncRNA 通常缺乏线性序列同源性;因此,鉴定一个 lncRNA 的功能很少能为鉴定其他 lncRNA 的功能提供信息。我们开发了一种序列比较方法,用于解构 lncRNA 中的线性序列关系,并基于称为 k-mer 的短基序的丰度来评估相似性。我们发现,尽管缺乏线性同源性,但具有相关功能的 lncRNA 通常具有相似的 k-mer 图谱,并且 k-mer 图谱与蛋白质与 lncRNA 的结合以及它们的亚细胞定位相关。使用一种新的测定法来量化 Xist 样调控潜能,我们直接证明了进化上不相关的 lncRNA 可以通过相关序列基序的不同空间排列来编码相似的功能。基于 k-mer 的分类是一种强大的方法,可以检测 lncRNA 中序列和功能之间的反复关系。