Suppr超能文献

SAKit:一种用于鉴定由大尺度和小尺度变异事件产生的新型蛋白质的一体化分析管道。

SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.

机构信息

Department of Breast Surgery, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing 100730, P. R. China.

Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, P. R. China.

出版信息

J Bioinform Comput Biol. 2024 Oct;22(5):2450022. doi: 10.1142/S0219720024500227. Epub 2024 Oct 1.

Abstract

Genetic mutations that cause the inactivation or aberrant activation of essential proteins may trigger alterations or even dysfunctions in cellular signaling pathways, culminating in the development of precancerous lesions and cancer. Mutations and such dysfunctions can result in the generation of "novel proteins" that are not part of the conventional human proteome. Identification of these proteins carries a profound potential for unraveling promising drug targets and designing innovative therapeutic models. Despite the emergence of diverse tools for detecting DNA or RNA variants, facilitated by the widespread adoption of nucleotide sequencing technology, these methods primarily target point mutations and exhibit suboptimal performance in detecting large-scale and combinatorial mutations. Additionally, the outcomes of these tools are confined to the genome and transcriptome levels, and do not provide the corresponding protein information resulting from genetic alterations. We present the development of Sequencing Analysis Kit (SAKit), a bioinformatics pipeline for hybrid sequencing analysis integrating long-read and short-read RNA sequencing data. Long reads are utilized for detecting large-scale variations such as gene fusions, exon skipping, intron retention, and aberrant expression in non-coding regions, owing to their excellent coverage capabilities. Short reads serve to validate these findings at breakpoints and splice junctions. Conversely, short reads are employed for identifying small-scale variations, including single nucleotide variants, deletions, and insertions, due to their superior sequencing depth, with long reads providing additional validation. SAKit is designed to perform analyses using inter-species configuration files comprising genome references and annotation data, making it applicable to both human and mouse studies. Furthermore, SAKit implements a hierarchical filtering approach to eliminate low-confidence variants and employs open reading frame (ORF) analysis to translate identified variants into protein sequences. SAKit is a robust and versatile bioinformatics tool designed for the comprehensive identification of both large-scale and small-scale variants from RNA-seq data, facilitating the discovery of novel proteins. This pipeline integrates analysis of long-read and short-read sequencing data, offering a powerful solution for researchers in genomics and transcriptomics. SAKit is freely accessible and open-source, available through GitHub (https://github.com/therarna/SAKit) and as a Docker image https://hub.docker.com/repository/docker/therarna). Implemented primarily within a Snakemake framework using Python, SAKit ensures reproducibility, scalability, and ease of use for the scientific community.

摘要

导致必需蛋白质失活或异常激活的基因突变,可能引发细胞信号通路的改变甚至功能障碍,最终导致癌前病变和癌症的发生。突变和这种功能障碍可能导致产生“新蛋白质”,这些蛋白质不是常规人类蛋白质组的一部分。鉴定这些蛋白质对于揭示有前途的药物靶点和设计创新的治疗模型具有深远的潜力。尽管出现了多种用于检测 DNA 或 RNA 变异的工具,并且由于核苷酸测序技术的广泛采用而变得更加便利,但这些方法主要针对点突变,并且在检测大规模和组合突变方面表现不佳。此外,这些工具的结果仅限于基因组和转录组水平,并且不提供遗传改变导致的相应蛋白质信息。我们介绍了测序分析工具包(SAKit)的开发,这是一个用于混合测序分析的生物信息学管道,集成了长读和短读 RNA 测序数据。由于其出色的覆盖能力,长读用于检测基因融合、外显子跳过、内含子保留和非编码区域中的异常表达等大规模变异。短读用于在断点和剪接接头处验证这些发现。相反,短读用于识别包括单核苷酸变异、缺失和插入在内的小规模变异,因为它们具有更高的测序深度,而长读则提供额外的验证。SAKit 设计用于使用包含基因组参考和注释数据的种间配置文件执行分析,因此既适用于人类研究,也适用于小鼠研究。此外,SAKit 实施了分层过滤方法来消除低可信度变异,并使用开放阅读框(ORF)分析将鉴定的变异转化为蛋白质序列。SAKit 是一种强大而通用的生物信息学工具,用于从 RNA-seq 数据中全面识别大规模和小规模变异,有助于发现新的蛋白质。该管道集成了长读和短读测序数据的分析,为基因组学和转录组学研究人员提供了强大的解决方案。SAKit 可免费访问且开源,可通过 GitHub(https://github.com/therarna/SAKit)和 Docker 映像(https://hub.docker.com/repository/docker/therarna)获得。主要在使用 Python 的 Snakemake 框架内实现,SAKit 确保了科学社区的可重复性、可扩展性和易用性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验