• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

VARUS:从序列读取档案中采样互补 RNA 读取。

VARUS: sampling complementary RNA reads from the sequence read archive.

机构信息

Institute for Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Str. 47, Greifswald, 17489, Germany.

Center for Functional Genomics of Microbes, University of Greifswald, Felix-Hausdorff-Str. 8, Greifswald, 17489, Germany.

出版信息

BMC Bioinformatics. 2019 Nov 8;20(1):558. doi: 10.1186/s12859-019-3182-x.

DOI:10.1186/s12859-019-3182-x
PMID:31703556
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6842140/
Abstract

BACKGROUND

Vast amounts of next generation sequencing RNA data has been deposited in archives, accompanying very diverse original studies. The data is readily available also for other purposes such as genome annotation or transcriptome assembly. However, selecting a subset of available experiments, sequencing runs and reads for this purpose is a nontrivial task and complicated by the inhomogeneity of the data.

RESULTS

This article presents the software VARUS that selects, downloads and aligns reads from NCBI's Sequence Read Archive, given only the species' binomial name and genome. VARUS automatically chooses runs from among all archived runs to randomly select subsets of reads. The objective of its online algorithm is to cover a large number of transcripts adequately when network bandwidth and computing resources are limited. For most tested species VARUS achieved both a higher sensitivity and specificity with a lower number of downloaded reads than when runs were manually selected. At the example of twelve eukaryotic genomes, we show that RNA-Seq that was sampled with VARUS is well-suited for fully-automatic genome annotation with BRAKER.

CONCLUSIONS

With VARUS, genome annotation can be automatized to the extent that not even the selection and quality control of RNA-Seq has to be done manually. This introduces the possibility to have fully automatized genome annotation loops over potentially many species without incurring a loss of accuracy over a manually supervised annotation process.

摘要

背景

大量下一代测序 RNA 数据已被存入档案,这些数据伴随了非常多样化的原始研究。这些数据也很容易被用于其他目的,例如基因组注释或转录组组装。然而,为了达到这些目的,选择实验、测序运行和读取的子集是一项非常复杂的任务,并且受到数据不均匀性的影响。

结果

本文介绍了 VARUS 软件,它可以根据物种的二项式名称和基因组,从 NCBI 的序列读取档案中选择、下载和对齐读取。VARUS 可以自动从所有存档运行中选择运行,随机选择读取的子集。其在线算法的目标是在网络带宽和计算资源有限的情况下,充分覆盖大量的转录本。对于大多数测试物种,VARUS 实现了更高的灵敏度和特异性,同时下载的读取数量比手动选择运行时要少。以 12 个真核基因组为例,我们表明,使用 VARUS 采样的 RNA-Seq 非常适合使用 BRAKER 进行全自动基因组注释。

结论

使用 VARUS,基因组注释可以实现自动化,甚至不需要手动进行 RNA-Seq 的选择和质量控制。这就引入了一种可能性,即在不需要人为监督注释过程的准确性损失的情况下,对可能许多物种进行全自动基因组注释循环。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405e/6842140/6bd7dbe99c25/12859_2019_3182_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405e/6842140/e5ef20aa6f16/12859_2019_3182_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405e/6842140/97c84f0d7022/12859_2019_3182_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405e/6842140/6cba2db2fbcc/12859_2019_3182_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405e/6842140/6bd7dbe99c25/12859_2019_3182_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405e/6842140/e5ef20aa6f16/12859_2019_3182_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405e/6842140/97c84f0d7022/12859_2019_3182_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405e/6842140/6cba2db2fbcc/12859_2019_3182_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405e/6842140/6bd7dbe99c25/12859_2019_3182_Fig4_HTML.jpg

相似文献

1
VARUS: sampling complementary RNA reads from the sequence read archive.VARUS:从序列读取档案中采样互补 RNA 读取。
BMC Bioinformatics. 2019 Nov 8;20(1):558. doi: 10.1186/s12859-019-3182-x.
2
Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm.将映射的RNA测序读数整合到真核生物基因发现算法的自动训练中。
Nucleic Acids Res. 2014 Sep;42(15):e119. doi: 10.1093/nar/gku557. Epub 2014 Jul 2.
3
Necklace: combining reference and assembled transcriptomes for more comprehensive RNA-Seq analysis.项链:结合参考基因组和组装转录组进行更全面的 RNA-Seq 分析。
Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy045.
4
FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences.FINDER:一个自动化软件包,用于从 RNA-Seq 数据和相关蛋白质序列中注释真核基因。
BMC Bioinformatics. 2021 Apr 20;22(1):205. doi: 10.1186/s12859-021-04120-9.
5
Enhancing Structural Annotation of Yeast Genomes with RNA-Seq Data.利用RNA测序数据增强酵母基因组的结构注释
Methods Mol Biol. 2016;1361:41-56. doi: 10.1007/978-1-4939-3079-1_2.
6
Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing.弗雷迪:使用长读测序进行注释独立的转录组可变剪接异构体的检测和发现。
Nucleic Acids Res. 2023 Jan 25;51(2):e11. doi: 10.1093/nar/gkac1112.
7
GASS: genome structural annotation for Eukaryotes based on species similarity.GASS:基于物种相似性的真核生物基因组结构注释
BMC Genomics. 2015 Mar 4;16(1):150. doi: 10.1186/s12864-015-1353-3.
8
Grape RNA-Seq analysis pipeline environment.葡萄 RNA-Seq 分析管道环境。
Bioinformatics. 2013 Mar 1;29(5):614-21. doi: 10.1093/bioinformatics/btt016. Epub 2013 Jan 17.
9
CLASS2: accurate and efficient splice variant annotation from RNA-seq reads.类别2:从RNA测序读段中进行准确且高效的剪接变体注释。
Nucleic Acids Res. 2016 Jun 2;44(10):e98. doi: 10.1093/nar/gkw158. Epub 2016 Mar 14.
10
Optimization of next-generation sequencing transcriptome annotation for species lacking sequenced genomes.针对缺乏测序基因组的物种优化新一代测序转录组注释
Mol Ecol Resour. 2016 Mar;16(2):446-58. doi: 10.1111/1755-0998.12465. Epub 2015 Oct 14.

引用本文的文献

1
Cell type-specific immune regulation under symbiosis in a facultatively symbiotic coral.兼性共生珊瑚中共生状态下的细胞类型特异性免疫调节
ISME J. 2025 Jan 2;19(1). doi: 10.1093/ismejo/wraf132.
2
The chromosomal genome sequence of the kidney sponge, Nardo, 1847, and its associated microbial metagenome sequences.1847年纳尔多所描述的肾海绵的染色体基因组序列及其相关微生物宏基因组序列。
Wellcome Open Res. 2025 May 29;10:283. doi: 10.12688/wellcomeopenres.24166.1. eCollection 2025.
3
BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA.

本文引用的文献

1
Predicting Genes in Single Genomes with AUGUSTUS.使用AUGUSTUS预测单基因组中的基因。
Curr Protoc Bioinformatics. 2019 Mar;65(1):e57. doi: 10.1002/cpbi.57. Epub 2018 Nov 22.
2
RNA-Seq differential expression analysis: An extended review and a software tool.RNA测序差异表达分析:扩展综述与软件工具
PLoS One. 2017 Dec 21;12(12):e0190152. doi: 10.1371/journal.pone.0190152. eCollection 2017.
3
Calculating the quality of public high-throughput sequencing data to obtain a suitable subset for reanalysis from the Sequence Read Archive.
BRAKER3:利用 RNA-seq 和蛋白质证据,通过 GeneMark-ETP、AUGUSTUS 和 TSEBRA 进行全自动基因组注释。
Genome Res. 2024 Jun 25;34(5):769-777. doi: 10.1101/gr.278090.123.
4
Galba: genome annotation with miniprot and AUGUSTUS.Galba:使用 miniprot 和 AUGUSTUS 进行基因组注释。
BMC Bioinformatics. 2023 Aug 31;24(1):327. doi: 10.1186/s12859-023-05449-z.
5
GALBA: Genome Annotation with Miniprot and AUGUSTUS.GALBA:使用Miniprot和AUGUSTUS进行基因组注释。
bioRxiv. 2023 Apr 10:2023.04.10.536199. doi: 10.1101/2023.04.10.536199.
6
TSEBRA: transcript selector for BRAKER.TSEBRA:BRAKER 的转录物选择器。
BMC Bioinformatics. 2021 Nov 25;22(1):566. doi: 10.1186/s12859-021-04482-0.
7
BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.BRAKER2:借助蛋白质数据库,由GeneMark-EP+和AUGUSTUS支持的真核生物基因组自动注释工具。
NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa108. doi: 10.1093/nargab/lqaa108. eCollection 2021 Mar.
8
GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins.GeneMark-EP+:在基因和蛋白质空间中进行自我训练的真核基因预测
NAR Genom Bioinform. 2020 Jun;2(2):lqaa026. doi: 10.1093/nargab/lqaa026. Epub 2020 May 13.
计算公共高通量测序数据的质量,以便从序列读取存档中获取合适的子集进行重新分析。
Gigascience. 2017 Jun 1;6(6):1-8. doi: 10.1093/gigascience/gix029.
4
BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.BRAKER1:基于RNA测序的无监督基因组注释,结合GeneMark-ET和AUGUSTUS
Bioinformatics. 2016 Mar 1;32(5):767-9. doi: 10.1093/bioinformatics/btv661. Epub 2015 Nov 11.
5
HISAT: a fast spliced aligner with low memory requirements.HISAT:一种内存需求低的快速剪接比对器。
Nat Methods. 2015 Apr;12(4):357-60. doi: 10.1038/nmeth.3317. Epub 2015 Mar 9.
6
StringTie enables improved reconstruction of a transcriptome from RNA-seq reads.StringTie能够从RNA测序读数中更完善地重建转录组。
Nat Biotechnol. 2015 Mar;33(3):290-5. doi: 10.1038/nbt.3122. Epub 2015 Feb 18.
7
Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm.将映射的RNA测序读数整合到真核生物基因发现算法的自动训练中。
Nucleic Acids Res. 2014 Sep;42(15):e119. doi: 10.1093/nar/gku557. Epub 2014 Jul 2.
8
GenomeTools: a comprehensive software library for efficient processing of structured genome annotations.基因组工具:一个用于高效处理结构化基因组注释的综合软件库。
IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):645-56. doi: 10.1109/TCBB.2013.68.
9
STAR: ultrafast universal RNA-seq aligner.STAR:超快通用 RNA-seq 对齐工具。
Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25.
10
The sequence read archive.序列读取存档库。
Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21. doi: 10.1093/nar/gkq1019. Epub 2010 Nov 9.