McKerrow Wilson H, Savva Yiannis A, Rezaei Ali, Reenan Robert A, Lawrence Charles E
Division of Applied Mathematics, Brown University, Providence, 02912, RI, USA.
Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, 02912, RI, USA.
BMC Genomics. 2017 Jul 10;18(1):522. doi: 10.1186/s12864-017-3898-9.
Repetitive elements are now known to have relevant cellular functions, including self-complementary sequences that form double stranded (ds) RNA. There are numerous pathways that determine the fate of endogenous dsRNA, and misregulation of endogenous dsRNA is a driver of autoimmune disease, particularly in the brain. Unfortunately, the alignment of high-throughput, short-read sequences to repeat elements poses a dilemma: Such sequences may align equally well to multiple genomic locations. In order to differentiate repeat elements, current alignment methods depend on sequence variation in the reference genome. Reads are discarded when no such variations are present. However, RNA hyper-editing, a possible fate for dsRNA, introduces enough variation to distinguish between repeats that are otherwise identical.
To take advantage of this variation, we developed a new algorithm, RepProfile, that simultaneously aligns reads and predicts novel variations. RepProfile accurately aligns hyper-edited reads that other methods discard. In particular we predict hyper-editing of Drosophila melanogaster repeat elements in vivo at levels previously described only in vitro, and provide validation by Sanger sequencing sixty-two individual cloned sequences. We find that hyper-editing is concentrated in genes involved in cell-cell communication at the synapse, including some that are associated with neurodegeneration. We also find that hyper-editing tends to occur in short runs.
Previous studies of RNA hyper-editing discarded ambiguously aligned reads, ignoring hyper-editing in long, perfect dsRNA - the perfect substrate for hyper-editing. We provide a method that simulation and Sanger validation show accurately predicts such RNA editing, yielding a superior picture of hyper-editing.
现已发现重复元件具有相关的细胞功能,包括形成双链(ds)RNA的自我互补序列。有许多途径决定内源性dsRNA的命运,内源性dsRNA的调控异常是自身免疫性疾病的驱动因素,尤其是在大脑中。不幸的是,高通量短读长序列与重复元件的比对带来了一个难题:此类序列可能与多个基因组位置具有同样好的比对效果。为了区分重复元件,当前的比对方法依赖于参考基因组中的序列变异。当不存在此类变异时,读取的序列会被丢弃。然而,RNA超编辑作为dsRNA的一种可能命运,会引入足够的变异以区分原本相同的重复序列。
为了利用这种变异,我们开发了一种新算法RepProfile,它能同时比对读取的序列并预测新的变异。RepProfile能准确比对其他方法丢弃的经过超编辑的读取序列。特别是,我们预测了黑腹果蝇重复元件在体内的超编辑水平,此前仅在体外描述过该水平,并通过对62个单独克隆序列进行桑格测序提供了验证。我们发现超编辑集中在参与突触处细胞间通讯的基因中,包括一些与神经退行性变相关的基因。我们还发现超编辑倾向于短程发生。
先前对RNA超编辑的研究丢弃了比对不明确的读取序列,忽略了长的、完美的dsRNA(超编辑的理想底物)中的超编辑。我们提供了一种方法,模拟和桑格验证表明该方法能准确预测此类RNA编辑,从而更全面地呈现超编辑情况。