• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关于从头拼接配对末端RNA测序数据。

On de novo Bridging Paired-end RNA-seq Data.

作者信息

Li Xiang, Shao Mingfu

出版信息

ArXiv. 2023 Mar 27:arXiv:2303.15594v1.

PMID:37033458
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10081347/
Abstract

The high-throughput short-reads RNA-seq protocols often produce paired-end reads, with the middle portion of the fragments being unsequenced. We explore if the full-length fragments can be computationally reconstructed from the sequenced two ends in the absence of the reference genome - a problem here we refer to as de novo bridging. Solving this problem provides longer, more informative RNA-seq reads, and benefits downstream RNA-seq analysis such as transcript assembly, expression quantification, and splicing differential analysis. However, de novo bridging is a challenging and complicated task owing to alternative splicing, transcript noises, and sequencing errors. It remains unclear if the data provides sufficient information for accurate bridging, let alone efficient algorithms that determine the true bridges. Methods have been proposed to bridge paired-end reads in the presence of reference genome (called reference-based bridging), but the algorithms are far away from scaling for de novo bridging as the underlying compacted de Bruijn graph(cdBG) used in the latter task often contains millions of vertices and edges. We designed a new truncated Dijkstra's algorithm for this problem, and proposed a novel algorithm that reuses the shortest path tree to avoid running the truncated Dijkstra's algorithm from scratch for all vertices for further speeding up. These innovative techniques result in scalable algorithms that can bridge all paired-end reads in a cdBG with millions of vertices. Our experiments showed that paired-end RNA-seq reads can be accurately bridged to a large extent. The resulting tool is freely available at https://github.com/Shao-Group/rnabridge-denovo.

摘要

高通量短读长RNA测序协议通常会产生双端读数,片段的中间部分未被测序。我们探讨在没有参考基因组的情况下,能否从已测序的两端通过计算重建全长片段——我们将此问题称为从头桥接。解决这个问题可以提供更长、信息更丰富的RNA测序读数,并有利于下游的RNA测序分析,如转录本组装、表达定量和剪接差异分析。然而,由于可变剪接、转录本噪声和测序错误,从头桥接是一项具有挑战性和复杂性的任务。目前尚不清楚数据是否提供了足够的信息进行准确桥接,更不用说确定真正桥接的高效算法了。已经有人提出在有参考基因组的情况下桥接双端读数的方法(称为基于参考的桥接),但这些算法远不能用于从头桥接,因为后一项任务中使用的底层压缩德布鲁因图(cdBG)通常包含数百万个顶点和边。我们针对这个问题设计了一种新的截断迪杰斯特拉算法,并提出了一种新颖的算法,该算法重用最短路径树,避免为所有顶点从头运行截断迪杰斯特拉算法以进一步加速。这些创新技术产生了可扩展的算法,能够在具有数百万个顶点的cdBG中桥接所有双端读数。我们的实验表明,双端RNA测序读数在很大程度上可以被准确桥接。所得工具可在https://github.com/Shao-Group/rnabridge-denovo上免费获取。

相似文献

1
On de novo Bridging Paired-end RNA-seq Data.关于从头拼接配对末端RNA测序数据。
ArXiv. 2023 Mar 27:arXiv:2303.15594v1.
2
On Bridging Paired-end RNA-seq Data.关于桥接双末端RNA测序数据
ACM BCB. 2023 Sep;2023. doi: 10.1145/3584371.3612987. Epub 2023 Oct 4.
3
IsoTree: A New Framework for de novo Transcriptome Assembly from RNA-seq Reads.IsoTree:一种从 RNA-seq 读取中从头组装转录组的新框架。
IEEE/ACM Trans Comput Biol Bioinform. 2020 May-Jun;17(3):938-948. doi: 10.1109/TCBB.2018.2808350. Epub 2018 Feb 21.
4
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.RSEM:有或无参考基因组的 RNA-Seq 数据的准确转录本定量。
BMC Bioinformatics. 2011 Aug 4;12:323. doi: 10.1186/1471-2105-12-323.
5
Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads.在短RNA测序读数的局部和全局从头转录组组装中与重复序列玩捉迷藏游戏。
Algorithms Mol Biol. 2017 Feb 22;12:2. doi: 10.1186/s13015-017-0091-2. eCollection 2017.
6
EBARDenovo: highly accurate de novo assembly of RNA-Seq with efficient chimera-detection.EBARDenovo:具有高效嵌合体检测功能的 RNA-Seq 从头组装的高度精确性。
Bioinformatics. 2013 Apr 15;29(8):1004-10. doi: 10.1093/bioinformatics/btt092. Epub 2013 Mar 1.
7
MultiTrans: An Algorithm for Path Extraction Through Mixed Integer Linear Programming for Transcriptome Assembly.MultiTrans:一种通过混合整数线性规划进行转录组组装的路径提取算法。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):48-56. doi: 10.1109/TCBB.2021.3083277. Epub 2022 Feb 3.
8
RResolver: efficient short-read repeat resolution within ABySS.RResolver:AByss 内高效的短读重复序列解决工具。
BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.
9
Inference of viral quasispecies with a paired de Bruijn graph.基于配对 de Bruijn 图的病毒准种推断。
Bioinformatics. 2021 May 1;37(4):473-481. doi: 10.1093/bioinformatics/btaa782.
10
SOPRA: Scaffolding algorithm for paired reads via statistical optimization.SOPRA:基于统计优化的配对读取支架算法。
BMC Bioinformatics. 2010 Jun 24;11:345. doi: 10.1186/1471-2105-11-345.