School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK.
James Hutton Institute, Invergowrie, DD2 5DA, UK.
Genome Biol. 2021 Mar 1;22(1):72. doi: 10.1186/s13059-021-02296-0.
Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification. Here we apply alignment metrics and machine-learning-derived sequence information to filter spurious splice junctions from long-read alignments and use the remaining junctions to guide realignment in a two-pass approach. This method, available in the software package 2passtools ( https://github.com/bartongroup/2passtools ), improves the accuracy of spliced alignment and transcriptome assembly for species both with and without existing high-quality annotations.
真核生物基因组的转录涉及 RNA 的复杂可变剪接。通过长读长测序完整的 RNA 可揭示剪接过程的真实复杂性。然而,长读长测序技术的相对较高错误率可能会降低内含子识别的准确性。在这里,我们应用比对指标和基于机器学习的序列信息来过滤长读长比对中虚假的剪接位点,并使用剩余的剪接位点通过双步重新比对方法来指导重新比对。该方法可用于软件包 2passtools(https://github.com/bartongroup/2passtools),可提高具有和不具有现有高质量注释的物种的剪接比对和转录组组装的准确性。