Suppr超能文献

Clair3-RNA:一种基于深度学习的长读长RNA测序数据小变异体检测工具。

Clair3-RNA: A deep learning-based small variant caller for long-read RNA sequencing data.

作者信息

Zheng Zhenxian, Yu Xian, Chen Lei, Lee Yan-Lam, Xin Cheng, Wong Angel On Ki, Jain Miten, Kesharwani Rupesh K, Sedlazeck Fritz J, Luo Ruibang

机构信息

Department of Computer Science, School of Computing and Data Science, University of Hong Kong, Hong Kong, China.

Department of Bioengineering, Northeastern University, Boston, MA, USA.

出版信息

bioRxiv. 2025 Jan 3:2024.11.17.624050. doi: 10.1101/2024.11.17.624050.

Abstract

Variant calling using long-read RNA sequencing (lrRNA-seq) can be applied to diverse tasks, such as capturing full-length isoforms and gene expression profiling. It poses challenges, however, due to higher error rates than DNA data, the complexities of transcript diversity, RNA editing events, etc. In this paper, we propose Clair3-RNA, the first deep learning-based variant caller tailored for lrRNA-seq data. Clair3-RNA leverages the strengths of the Clair series' pipelines and incorporates several techniques optimized for lrRNA-seq data, such as uneven coverage normalization, refinement of training materials, editing site discovery, and the incorporation of phasing haplotype to enhance variant-calling performance. Clair3-RNA is available for various platforms, including PacBio and ONT complementary DNA sequencing (cDNA), and ONT direct RNA sequencing (dRNA). Our results demonstrated that Clair3-RNA achieved a ~91% SNP F1-score on the ONT platform using the latest ONT SQK-RNA004 kit (dRNA004) and a ~92% SNP F1-score in PacBio Iso-Seq and MAS-Seq for variants supported by at least four reads. The performance reached a ~95% and ~96% F1-score for ONT and PacBio, respectively, with at least ten supporting reads and disregarding the zygosity. With read phased, the performance reached ~97% for ONT and ~98% for PacBio. Extensive evaluation of various GIAB samples demonstrated that Clair3-RNA consistently outperformed existing callers and is capable of distinguishing RNA high-quality editing sites from variants accurately. Clair3-RNA is open-source and available at (https://github.com/HKU-BAL/Clair3-RNA).

摘要

使用长读长RNA测序(lrRNA-seq)进行变异检测可应用于多种任务,如捕获全长异构体和基因表达谱分析。然而,由于其错误率高于DNA数据、转录本多样性的复杂性、RNA编辑事件等因素,这带来了挑战。在本文中,我们提出了Clair3-RNA,这是首个专为lrRNA-seq数据量身定制的基于深度学习的变异检测工具。Clair3-RNA利用了Clair系列流程的优势,并融入了多种针对lrRNA-seq数据优化的技术,如不均匀覆盖度归一化、训练材料的优化、编辑位点发现以及引入定相单倍型以提高变异检测性能。Clair3-RNA可用于各种平台,包括PacBio和ONT互补DNA测序(cDNA)以及ONT直接RNA测序(dRNA)。我们的结果表明,使用最新的ONT SQK-RNA004试剂盒(dRNA004)时,Clair3-RNA在ONT平台上实现了约91%的SNP F1分数,在PacBio Iso-Seq和MAS-Seq中,对于至少有四个读段支持的变异,SNP F1分数约为92%。对于ONT和PacBio,分别有至少十个支持读段且不考虑纯合性时,性能达到约95%和96%的F1分数。进行读段定相后,ONT的性能达到约97%,PacBio的性能达到约98%。对各种GIAB样本的广泛评估表明,Clair3-RNA始终优于现有检测工具,并且能够准确区分RNA高质量编辑位点和变异。Clair3-RNA是开源的,可在(https://github.com/HKU-BAL/Clair3-RNA)获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71f9/11722298/2c875e8473e5/nihpp-2024.11.17.624050v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验