Clutterbuck D R, Leroy A, O'Connell M A, Semple C A M
MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK.
Bioinformatics. 2005 Jun 1;21(11):2590-5. doi: 10.1093/bioinformatics/bti411. Epub 2005 Mar 29.
Recent studies have demonstrated widespread adenosine-inosine RNA editing in non-coding sequence. However, the extent of editing in coding sequences has remained unknown. For many of the known sites, editing can be observed in multiple species and often occurs in well-conserved sequences. In addition, they often occur within imperfect inverted repeats and in clusters. Here we present a bioinformatic approach to identify novel sites based on these shared features. Mismatches between genomic and expressed sequences were filtered to remove the main sources of false positives, and then prioritized based on these features. This protocol is tailored to identifying specific recoding editing sites, rather than sites in non-coding repeat sequences.
Our protocol is more sensitive for identifying known coding editing sites than any previously published mammalian screen. A novel multiply edited transcript, BC10, was identified and experimentally verified. BC10 is highly conserved across a range of metazoa and has been implicated in two forms of cancer.
最近的研究表明非编码序列中存在广泛的腺苷-肌苷RNA编辑。然而,编码序列中的编辑程度仍然未知。对于许多已知位点,可以在多个物种中观察到编辑,并且通常发生在高度保守的序列中。此外,它们经常出现在不完全反向重复序列和簇中。在此,我们提出一种基于这些共同特征来识别新位点的生物信息学方法。对基因组序列和表达序列之间的错配进行过滤,以去除主要的假阳性来源,然后根据这些特征进行优先级排序。该方案专门用于识别特定的重新编码编辑位点,而不是非编码重复序列中的位点。
我们的方案在识别已知编码编辑位点方面比以往任何已发表的哺乳动物筛选方法都更敏感。一个新的多重编辑转录本BC10被识别并通过实验验证。BC10在一系列后生动物中高度保守,并与两种癌症形式有关。