Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.
College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae336.
There are many clustered transcriptionally active regions in the human genome, in which the transcription complex cannot immediately terminate transcription at the upstream gene termination site, but instead continues to transcribe intergenic regions and downstream genes, resulting in read-through transcripts. Several studies have demonstrated the regulatory roles of read-through transcripts in tumorigenesis and development. However, limited by the read length of next-generation sequencing, discovery of read-through transcripts has been slow. For long but also erroneous third-generation sequencing data, this study developed a novel minimizer sketch algorithm to accurately and quickly identify read-through transcripts.
Readon initially splits the reference sequence into distinct active regions. It employs a sliding window approach within each region, calculates minimizers, and constructs the specialized structured arrays for query indexing. Following initial alignment anchor screening of candidate read-through transcripts, further confirmation steps are executed. Comparative assessments against existing software reveal Readon's superior performance on both simulated and validated real data. Additionally, two downstream tools are provided: one for predicting whether a read-through transcript is likely to undergo nonsense-mediated decay or encodes a protein, and another for visualizing splicing patterns.
Readon is freely available on GitHub (https://github.com/Bulabula45/Readon).
人类基因组中有许多转录活跃的簇区,在这些区域中,转录复合物不能立即在上游基因终止位点终止转录,而是继续转录基因间区和下游基因,导致通读转录本。几项研究表明,通读转录本在肿瘤发生和发展中具有调节作用。然而,受下一代测序读长的限制,通读转录本的发现进展缓慢。对于长但也有错误的第三代测序数据,本研究开发了一种新颖的 minimizer sketch 算法,可以准确快速地识别通读转录本。
Readon 最初将参考序列分割成不同的活跃区域。它在每个区域内采用滑动窗口方法,计算 minimizers,并构建专门的结构化数组以进行查询索引。在对候选通读转录本进行初始对齐锚筛选后,执行进一步的确认步骤。与现有软件的比较评估表明,Readon 在模拟和验证的真实数据上都具有更好的性能。此外,还提供了两个下游工具:一个用于预测通读转录本是否可能经历无义介导的衰变或编码蛋白质,另一个用于可视化剪接模式。
Readon 可在 GitHub(https://github.com/Bulabula45/Readon)上免费获得。