• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于NGS数据管理与分析的集成系统:开放性问题与可用解决方案

Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions.

作者信息

Bianchi Valerio, Ceol Arnaud, Ogier Alessandro G E, de Pretis Stefano, Galeota Eugenia, Kishore Kamal, Bora Pranami, Croci Ottavio, Campaner Stefano, Amati Bruno, Morelli Marco J, Pelizzola Mattia

机构信息

Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy.

Department of Experimental Oncology, European Institute of Oncology Milano, Italy.

出版信息

Front Genet. 2016 May 6;7:75. doi: 10.3389/fgene.2016.00075. eCollection 2016.

DOI:10.3389/fgene.2016.00075
PMID:27200084
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4858535/
Abstract

Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non-experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HTS-flow, a new workflow management system conceived to address the concerns we raised. HTS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.

摘要

新一代测序(NGS)技术以可承受的价格提供了惊人数量的数据,深刻改变了我们对细胞过程的理解;如今,许多生物学实验室已经积累了大量测序样本。然而,管理和分析这些数据带来了新的挑战,缺乏信息技术和定量技能的研究团队可能很容易低估这些挑战。从这个角度来看,我们确定了研究团队在采用NGS技术时应仔细解决的五个问题。具体而言,需要考虑的五个关键问题是:(1)采用实验室管理系统(LIMS)并在下游分析中保护所得原始数据结构;(2)监控数据流并标准化输入和输出目录以及文件名,即使在对同一数据使用多种分析协议时也是如此;(3)确保所执行分析的完全可追溯性;(4)使没有经验的用户能够通过作为管道前端的图形用户界面(GUI)运行分析;(5)依靠标准元数据注释数据集,并尽可能使用受控词汇表,最好是源自生物医学本体的词汇表。最后,我们根据这些问题讨论了当前可用的工具,并介绍了HTS-flow,这是一个新的工作流程管理系统,旨在解决我们提出的问题。HTS-flow能够从LIMS数据库检索信息,通过简单的GUI管理数据分析,在标准位置输出数据,并允许数据集、伴随的元数据和分析脚本具有完全可追溯性。

相似文献

1
Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions.用于NGS数据管理与分析的集成系统:开放性问题与可用解决方案
Front Genet. 2016 May 6;7:75. doi: 10.3389/fgene.2016.00075. eCollection 2016.
2
OTP: An automatized system for managing and processing NGS data.OTP:一个用于管理和处理 NGS 数据的自动化系统。
J Biotechnol. 2017 Nov 10;261:53-62. doi: 10.1016/j.jbiotec.2017.08.006. Epub 2017 Aug 10.
3
SMITH: a LIMS for handling next-generation sequencing workflows.史密斯:一个用于处理下一代测序工作流程的实验室信息管理系统。
BMC Bioinformatics. 2014;15 Suppl 14(Suppl 14):S3. doi: 10.1186/1471-2105-15-S14-S3. Epub 2014 Nov 27.
4
NEAT: a framework for building fully automated NGS pipelines and analyses.NEAT:一个用于构建全自动二代测序流程及分析的框架。
BMC Bioinformatics. 2016 Feb 1;17:53. doi: 10.1186/s12859-016-0902-3.
5
MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data.MetaGenSense:一个用于分析和探索高通量测序宏基因组数据的网络应用程序。
F1000Res. 2015 Apr 2;4:86. doi: 10.12688/f1000research.6139.3. eCollection 2015.
6
qPortal: A platform for data-driven biomedical research.qPortal:一个用于数据驱动型生物医学研究的平台。
PLoS One. 2018 Jan 19;13(1):e0191603. doi: 10.1371/journal.pone.0191603. eCollection 2018.
7
SNVerGUI: a desktop tool for variant analysis of next-generation sequencing data.SNVerGUI:一种用于下一代测序数据分析的桌面工具。
J Med Genet. 2012 Dec;49(12):753-5. doi: 10.1136/jmedgenet-2012-101001. Epub 2012 Sep 28.
8
Laboratory Information Management Software for genotyping workflows: applications in high throughput crop genotyping.用于基因分型工作流程的实验室信息管理软件:在高通量作物基因分型中的应用
BMC Bioinformatics. 2006 Aug 17;7:383. doi: 10.1186/1471-2105-7-383.
9
Galaxy LIMS for next-generation sequencing.星系二代测序实验室信息管理系统。
Bioinformatics. 2013 May 1;29(9):1233-4. doi: 10.1093/bioinformatics/btt115. Epub 2013 Mar 11.
10
Watchdog - a workflow management system for the distributed analysis of large-scale experimental data.Watchdog - 一种用于大规模实验数据分析的分布式工作流管理系统。
BMC Bioinformatics. 2018 Mar 13;19(1):97. doi: 10.1186/s12859-018-2107-4.

引用本文的文献

1
fuels colorectal cancer through CHI3L1-mediated iNKT cell-driven immune evasion.通过 CHI3L1 介导的 iNKT 细胞驱动的免疫逃避作用促进结直肠癌的发生。
Gut Microbes. 2024 Jan-Dec;16(1):2388801. doi: 10.1080/19490976.2024.2388801. Epub 2024 Aug 12.
2
Real-World Data and Clinical Implications of Next-Generation Sequencing (NGS)-Based Analysis in Metastatic Breast Cancer Patients.基于下一代测序(NGS)的分析在转移性乳腺癌患者中的真实世界数据和临床意义。
Int J Mol Sci. 2024 Feb 20;25(5):2490. doi: 10.3390/ijms25052490.
3
A high-performance computational workflow to accelerate GATK SNP detection across a 25-genome dataset.

本文引用的文献

1
Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.基因组学虚拟实验室:面向云端的实用生物信息学工作台。
PLoS One. 2015 Oct 26;10(10):e0140829. doi: 10.1371/journal.pone.0140829. eCollection 2015.
2
methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data.methylPipe和compEpiTools:一套用于表观基因组学数据综合分析的R包。
BMC Bioinformatics. 2015 Sep 29;16:313. doi: 10.1186/s12859-015-0742-6.
3
The MI bundle: enabling network and structural biology in genome visualization tools.
一种用于加速在25个基因组数据集上进行GATK单核苷酸多态性检测的高性能计算工作流程。
BMC Biol. 2024 Jan 25;22(1):13. doi: 10.1186/s12915-024-01820-5.
4
c-MYC-dependent transcriptional inhibition of autophagy is implicated in cisplatin sensitivity in HPV-positive head and neck cancer.c-MYC 依赖性转录抑制自噬与 HPV 阳性头颈部癌中顺铂敏感性相关。
Cell Death Dis. 2023 Nov 4;14(11):719. doi: 10.1038/s41419-023-06248-3.
5
Application and Challenge of 3rd Generation Sequencing for Clinical Bacterial Studies.三代测序技术在临床细菌研究中的应用与挑战
Int J Mol Sci. 2022 Jan 26;23(3):1395. doi: 10.3390/ijms23031395.
6
A scalable high-throughput targeted next-generation sequencing assay for comprehensive genomic profiling of solid tumors.一种用于实体瘤全面基因组分析的可扩展高通量靶向新一代测序检测方法。
PLoS One. 2021 Dec 2;16(12):e0260089. doi: 10.1371/journal.pone.0260089. eCollection 2021.
7
New insight into the catalytic -dependent and -independent roles of METTL3 in sustaining aberrant translation in chronic myeloid leukemia.深入了解 METTL3 在慢性髓性白血病中维持异常翻译的催化依赖性和非依赖性作用。
Cell Death Dis. 2021 Sep 24;12(10):870. doi: 10.1038/s41419-021-04169-7.
8
Digital Management Systems in Academic Health Sciences Laboratories: A Scoping Review.学术健康科学实验室中的数字管理系统:一项范围综述。
Healthcare (Basel). 2021 Jun 16;9(6):739. doi: 10.3390/healthcare9060739.
9
Integrated requirement of non-specific and sequence-specific DNA binding in Myc-driven transcription.Myc 驱动转录中非特异性和序列特异性 DNA 结合的综合需求。
EMBO J. 2021 May 17;40(10):e105464. doi: 10.15252/embj.2020105464. Epub 2021 Apr 1.
10
Unique -mer sequences for validating cancer-related substitution, insertion and deletion mutations.用于验证癌症相关替换、插入和缺失突变的独特单链序列。
NAR Cancer. 2020 Dec;2(4):zcaa034. doi: 10.1093/narcan/zcaa034. Epub 2020 Dec 10.
MI 束:在基因组可视化工具中实现网络和结构生物学。
Bioinformatics. 2015 Nov 15;31(22):3679-81. doi: 10.1093/bioinformatics/btv431. Epub 2015 Jul 25.
4
QuickNGS elevates Next-Generation Sequencing data analysis to a new level of automation.QuickNGS将下一代测序数据分析提升到了一个新的自动化水平。
BMC Genomics. 2015 Jul 1;16(1):487. doi: 10.1186/s12864-015-1695-x.
5
Systematically evaluating interfaces for RNA-seq analysis from a life scientist perspective.从生命科学家的角度系统地评估 RNA-seq 分析的接口。
Brief Bioinform. 2016 Mar;17(2):213-23. doi: 10.1093/bib/bbv036. Epub 2015 Jun 23.
6
INSPEcT: a computational tool to infer mRNA synthesis, processing and degradation dynamics from RNA- and 4sU-seq time course experiments.INSPEcT:一种从RNA测序和4sU测序时间进程实验推断mRNA合成、加工和降解动力学的计算工具。
Bioinformatics. 2015 Sep 1;31(17):2829-35. doi: 10.1093/bioinformatics/btv288. Epub 2015 May 7.
7
Multi-omic data analysis using Galaxy.使用Galaxy进行多组学数据分析。
Nat Biotechnol. 2015 Feb;33(2):137-9. doi: 10.1038/nbt.3134.
8
GenoMetric Query Language: a novel approach to large-scale genomic data management.基因组查询语言:一种大规模基因组数据管理的新方法。
Bioinformatics. 2015 Jun 15;31(12):1881-8. doi: 10.1093/bioinformatics/btv048. Epub 2015 Feb 3.
9
Omics Pipe: a community-based framework for reproducible multi-omics data analysis.组学管道:一个基于社区的可重复多组学数据分析框架。
Bioinformatics. 2015 Jun 1;31(11):1724-8. doi: 10.1093/bioinformatics/btv061. Epub 2015 Jan 30.
10
SMITH: a LIMS for handling next-generation sequencing workflows.史密斯:一个用于处理下一代测序工作流程的实验室信息管理系统。
BMC Bioinformatics. 2014;15 Suppl 14(Suppl 14):S3. doi: 10.1186/1471-2105-15-S14-S3. Epub 2014 Nov 27.