• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SimBA:一种用于评估RNA测序生物信息学流程性能的方法和工具。

SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines.

作者信息

Audoux Jérôme, Salson Mikaël, Grosset Christophe F, Beaumeunier Sacha, Holder Jean-Marc, Commes Thérèse, Philippe Nicolas

机构信息

SeqOne, IRMB, CHRU de Montpellier -Hopital St Eloi, 80 avenue Augustin Fliche, Montpellier, 34295, France.

Institute of Computational Biology, Montpellier, 860, Rue Saint-Priest, Montpellier Cedex 5, 34095, France.

出版信息

BMC Bioinformatics. 2017 Sep 29;18(1):428. doi: 10.1186/s12859-017-1831-5.

DOI:10.1186/s12859-017-1831-5
PMID:28969586
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5623974/
Abstract

BACKGROUND

The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices.

RESULTS

To evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved.

CONCLUSION

Our research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/ .

摘要

背景

下一代测序(NGS)技术的发展使得人们对RNA测序(RNA-Seq)的关注日益增加。已经开发了许多用于RNA-Seq分析的生物信息学工具,每个工具都有独特的性能特征和配置参数。用户在理解哪些生物信息学工具最适合其特定需求以及应如何配置这些工具方面面临着日益复杂的任务。为了回答这些问题,我们研究了用于RNA-Seq分析的领先生物信息学工具的性能,并提出了一种系统评估和比较性能的方法,以帮助用户做出明智的选择。

结果

为了评估RNA-Seq流程,我们开发了一套两个基准测试工具。SimCT生成模拟数据集,这些数据集尽可能接近特定的真实生物学条件,并附带已插入的基因组事件和突变列表。然后,BenchCT将针对SimCT数据集运行的任何生物信息学流程的输出与其中包含的模拟基因组和转录变异进行比较,以在解决特定生物学问题时给出准确的性能评估。我们使用这些工具模拟了一个涉及健康细胞和癌细胞比较的真实世界基因组医学问题。结果表明,根据所使用的工具和设置的选择,在解决特定生物学背景时的性能差异很大。我们还发现,通过组合某些流程的输出,可以实现显著的性能提升。

结论

我们的研究强调了为所研究的特定生物学问题选择和配置生物信息学工具以获得最佳结果的重要性。流程设计者、开发者和用户应将针对其生物学问题的基准测试作为其设计和质量控制过程的一部分。我们的SimBA基准测试工具套件为比较RNA-Seq生物信息学流程在解决特定生物学问题时的性能提供了可靠的基础。我们希望看到创建一个数据集参考语料库,以便能够准确比较不同组执行的基准测试,并基于此公共语料库发布更多基准测试。SimBA软件和数据集可在http://cractools.gforge.inria.fr/softwares/simba/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba2d/5623974/8ed4be0ad78b/12859_2017_1831_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba2d/5623974/82803fd7027e/12859_2017_1831_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba2d/5623974/fec0464d6f41/12859_2017_1831_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba2d/5623974/474e52d4d69a/12859_2017_1831_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba2d/5623974/a20c931c2e84/12859_2017_1831_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba2d/5623974/da88471b7f35/12859_2017_1831_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba2d/5623974/8ed4be0ad78b/12859_2017_1831_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba2d/5623974/82803fd7027e/12859_2017_1831_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba2d/5623974/fec0464d6f41/12859_2017_1831_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba2d/5623974/474e52d4d69a/12859_2017_1831_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba2d/5623974/a20c931c2e84/12859_2017_1831_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba2d/5623974/da88471b7f35/12859_2017_1831_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba2d/5623974/8ed4be0ad78b/12859_2017_1831_Fig6_HTML.jpg

相似文献

1
SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines.SimBA:一种用于评估RNA测序生物信息学流程性能的方法和工具。
BMC Bioinformatics. 2017 Sep 29;18(1):428. doi: 10.1186/s12859-017-1831-5.
2
SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.SPARTA:用于基于参考的细菌RNA测序转录组自动分析的简单程序。
BMC Bioinformatics. 2016 Feb 4;17:66. doi: 10.1186/s12859-016-0923-y.
3
Elucidating the editome: bioinformatics approaches for RNA editing detection.阐明编辑组学:用于 RNA 编辑检测的生物信息学方法。
Brief Bioinform. 2019 Mar 22;20(2):436-447. doi: 10.1093/bib/bbx129.
4
QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization.QuickRNASeq将大规模RNA测序数据分析提升到了一个新的自动化和交互式可视化水平。
BMC Genomics. 2016 Jan 8;17:39. doi: 10.1186/s12864-015-2356-9.
5
Indel sensitive and comprehensive variant/mutation detection from RNA sequencing data for precision medicine.从 RNA 测序数据中进行灵敏且全面的基因变异/突变检测,以实现精准医疗。
BMC Med Genomics. 2018 Sep 14;11(Suppl 3):67. doi: 10.1186/s12920-018-0391-5.
6
Fully automated pipeline for detection of sex linked genes using RNA-Seq data.使用RNA测序数据检测性连锁基因的全自动流程
BMC Bioinformatics. 2015 Mar 11;16(1):78. doi: 10.1186/s12859-015-0509-0.
7
Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data.RNA测序数据中融合转录本检测方法的比较评估
Sci Rep. 2016 Feb 10;6:21597. doi: 10.1038/srep21597.
8
ChimPipe: accurate detection of fusion genes and transcription-induced chimeras from RNA-seq data.ChimPipe:从RNA测序数据中准确检测融合基因和转录诱导嵌合体。
BMC Genomics. 2017 Jan 3;18(1):7. doi: 10.1186/s12864-016-3404-9.
9
Grape RNA-Seq analysis pipeline environment.葡萄 RNA-Seq 分析管道环境。
Bioinformatics. 2013 Mar 1;29(5):614-21. doi: 10.1093/bioinformatics/btt016. Epub 2013 Jan 17.
10
TOGGLE: toolbox for generic NGS analyses.TOGGLE:通用下一代测序分析工具箱。
BMC Bioinformatics. 2015 Nov 9;16:374. doi: 10.1186/s12859-015-0795-6.

引用本文的文献

1
BEERS2: RNA-Seq simulation through high fidelity in silico modeling.BEERS2:通过高保真的计算机模拟进行 RNA-Seq 模拟。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae164.
2
Challenges and best practices in omics benchmarking.组学基准测试中的挑战和最佳实践。
Nat Rev Genet. 2024 May;25(5):326-339. doi: 10.1038/s41576-023-00679-6. Epub 2024 Jan 12.
3
Fusion InPipe, an integrative pipeline for gene fusion detection from RNA-seq data in acute pediatric leukemia.Fusion InPipe,一种用于从急性小儿白血病的RNA测序数据中检测基因融合的综合流程。

本文引用的文献

1
Simulation-based comprehensive benchmarking of RNA-seq aligners.基于模拟的RNA测序比对工具综合基准测试
Nat Methods. 2017 Feb;14(2):135-139. doi: 10.1038/nmeth.4106. Epub 2016 Dec 12.
2
On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs.关于监督分类器在嵌合RNA预测中保真度的评估。
BioData Min. 2016 Nov 2;9:34. doi: 10.1186/s13040-016-0112-6. eCollection 2016.
3
Indel detection from RNA-seq data: tool evaluation and strategies for accurate detection of actionable mutations.
Front Mol Biosci. 2023 Jun 9;10:1141310. doi: 10.3389/fmolb.2023.1141310. eCollection 2023.
4
Mutation-Simulator: fine-grained simulation of random mutations in any genome.突变模拟器:对任何基因组中的随机突变进行细粒度模拟。
Bioinformatics. 2021 May 1;37(4):568-569. doi: 10.1093/bioinformatics/btaa716.
从 RNA-seq 数据中检测插入缺失:工具评估和准确检测可操作突变的策略。
Brief Bioinform. 2017 Nov 1;18(6):973-983. doi: 10.1093/bib/bbw069.
4
A benchmark for RNA-seq quantification pipelines.RNA测序定量流程的一个基准。
Genome Biol. 2016 Apr 23;17:74. doi: 10.1186/s13059-016-0940-1.
5
Translating RNA sequencing into clinical diagnostics: opportunities and challenges.将RNA测序转化为临床诊断:机遇与挑战。
Nat Rev Genet. 2016 May;17(5):257-71. doi: 10.1038/nrg.2016.10. Epub 2016 Mar 21.
6
Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data.RNA测序数据中融合转录本检测方法的比较评估
Sci Rep. 2016 Feb 10;6:21597. doi: 10.1038/srep21597.
7
A survey of best practices for RNA-seq data analysis.RNA测序数据分析的最佳实践调查。
Genome Biol. 2016 Jan 26;17:13. doi: 10.1186/s13059-016-0881-8.
8
Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data.融合转录本检测算法的综合评估以及一种元调用程序,用于在双端RNA测序数据中结合性能最佳的方法。
Nucleic Acids Res. 2016 Mar 18;44(5):e47. doi: 10.1093/nar/gkv1234. Epub 2015 Nov 17.
9
Teaser: Individualized benchmarking and optimization of read mapping results for NGS data.预告:对NGS数据的读段映射结果进行个性化基准测试和优化
Genome Biol. 2015 Oct 22;16:235. doi: 10.1186/s13059-015-0803-1.
10
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.