iCOMIC：一个由图形界面驱动的用于分析癌症组学数据的生物信息学流程。

iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data.

作者信息

Anilkumar Sithara Anjana, Maripuri Devi Priyanka, Moorthy Keerthika, Amirtha Ganesh Sai Sruthi, Philip Philge, Banerjee Shayantan, Sudhakar Malvika, Raman Karthik

机构信息

Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras, Chennai 600036, India.

Centre for Integrative Biology and Systems mEdicine, IIT Madras, India.

出版信息

NAR Genom Bioinform. 2022 Jul 25;4(3):lqac053. doi: 10.1093/nargab/lqac053. eCollection 2022 Sep.

DOI:10.1093/nargab/lqac053

PMID:35899080

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9310080/

Abstract

Despite the tremendous increase in omics data generated by modern sequencing technologies, their analysis can be tricky and often requires substantial expertise in bioinformatics. To address this concern, we have developed a user-friendly pipeline to analyze (cancer) genomic data that takes in raw sequencing data (FASTQ format) as input and outputs insightful statistics. Our iCOMIC toolkit pipeline featuring many independent workflows is embedded in the popular Snakemake workflow management system. It can analyze whole-genome and transcriptome data and is characterized by a user-friendly GUI that offers several advantages, including minimal execution steps and eliminating the need for complex command-line arguments. Notably, we have integrated algorithms developed in-house to predict pathogenicity among cancer-causing mutations and differentiate between tumor suppressor genes and oncogenes from somatic mutation data. We benchmarked our tool against Genome In A Bottle benchmark dataset (NA12878) and got the highest F1 score of 0.971 and 0.988 for indels and SNPs, respectively, using the BWA MEM-GATK HC DNA-Seq pipeline. Similarly, we achieved a correlation coefficient of = 0.85 using the HISAT2-StringTie-ballgown and STAR-StringTie-ballgown RNA-Seq pipelines on the human monocyte dataset (SRP082682). Overall, our tool enables easy analyses of omics datasets, significantly ameliorating complex data analysis pipelines.

摘要

尽管现代测序技术产生的组学数据大幅增加，但其分析可能很棘手，通常需要生物信息学方面的大量专业知识。为了解决这一问题，我们开发了一种用户友好的流程来分析（癌症）基因组数据，该流程将原始测序数据（FASTQ格式）作为输入，并输出有见地的统计信息。我们具有许多独立工作流程的iCOMIC工具包流程嵌入在流行的Snakemake工作流管理系统中。它可以分析全基因组和转录组数据，其特点是具有用户友好的图形用户界面（GUI），具有多个优点，包括最少的执行步骤以及无需复杂的命令行参数。值得注意的是，我们整合了内部开发的算法，以预测致癌突变中的致病性，并从体细胞突变数据中区分肿瘤抑制基因和癌基因。我们使用BWA MEM-GATK HC DNA-Seq流程，将我们的工具与“瓶中的基因组”基准数据集（NA12878）进行基准测试，对于插入缺失和单核苷酸多态性（SNP），分别获得了最高F1分数0.971和0.988。同样，在人类单核细胞数据集（SRP082682）上，我们使用HISAT2-StringTie-ballgown和STAR-StringTie-ballgown RNA-Seq流程实现了相关系数r = 0.85。总体而言，我们的工具能够轻松分析组学数据集，显著改善复杂的数据分析流程。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f3c7/9310080/3fd7b38dd8d9/lqac053fig1.jpg

相似文献

iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data.iCOMIC：一个由图形界面驱动的用于分析癌症组学数据的生物信息学流程。

NAR Genom Bioinform. 2022 Jul 25;4(3):lqac053. doi: 10.1093/nargab/lqac053. eCollection 2022 Sep.

hppRNA-a Snakemake-based handy parameter-free pipeline for RNA-Seq analysis of numerous samples.hppRNA-a 是一个基于 SnakeMake 的、方便易用的无参 RNA-Seq 分析流水线，可用于大量样本的分析。

Brief Bioinform. 2018 Jul 20;19(4):622-626. doi: 10.1093/bib/bbw143.

High-throughput bioinformatics with the Cyrille2 pipeline system.使用西里尔2管道系统的高通量生物信息学。

BMC Bioinformatics. 2008 Feb 12;9:96. doi: 10.1186/1471-2105-9-96.

An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.一种用于下一代测序数据的综合单核苷酸多态性挖掘与利用（ISMU）流程。

PLoS One. 2014 Jul 8;9(7):e101754. doi: 10.1371/journal.pone.0101754. eCollection 2014.

scAnalyzeR: A Comprehensive Software Package With Graphical User Interface for Single-Cell RNA Sequencing Analysis and its Application on Liver Cancer.scAnalyzeR：一个用于单细胞 RNA 测序分析的综合性软件包，带有图形用户界面，及其在肝癌中的应用。

Technol Cancer Res Treat. 2022 Jan-Dec;21:15330338221142729. doi: 10.1177/15330338221142729.

Automated Isoform Diversity Detector (AIDD): a pipeline for investigating transcriptome diversity of RNA-seq data.自动化异构体多样性检测工具（AIDD）：用于研究 RNA-seq 数据转录组多样性的管道。

BMC Bioinformatics. 2020 Dec 30;21(Suppl 18):578. doi: 10.1186/s12859-020-03888-6.

Improved RNA-seq Workflows Using CyVerse Cyberinfrastructure.使用CyVerse网络基础设施改进RNA测序工作流程。

Curr Protoc Bioinformatics. 2018 Sep;63(1):e53. doi: 10.1002/cpbi.53. Epub 2018 Aug 31.

SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data.SeqAssist：一种用于下一代测序数据初步分析的新型工具包。

BMC Bioinformatics. 2014;15 Suppl 11(Suppl 11):S10. doi: 10.1186/1471-2105-15-S11-S10. Epub 2014 Oct 21.

Grape RNA-Seq analysis pipeline environment.葡萄 RNA-Seq 分析管道环境。

Bioinformatics. 2013 Mar 1;29(5):614-21. doi: 10.1093/bioinformatics/btt016. Epub 2013 Jan 17.

Watchdog - a workflow management system for the distributed analysis of large-scale experimental data.Watchdog - 一种用于大规模实验数据分析的分布式工作流管理系统。

BMC Bioinformatics. 2018 Mar 13;19(1):97. doi: 10.1186/s12859-018-2107-4.

引用本文的文献

MIRACUM-Pipe: An Adaptable Pipeline for Next-Generation Sequencing Analysis, Reporting, and Visualization for Clinical Decision Making.MIRACUM-Pipe：一种适用于下一代测序分析、报告及可视化以支持临床决策的流程。

Cancers (Basel). 2023 Jul 1;15(13):3456. doi: 10.3390/cancers15133456.

Periodontal treatment and microbiome-targeted therapy in management of periodontitis-related nonalcoholic fatty liver disease with oral and gut dysbiosis.口腔和肠道菌群失调与牙周炎相关非酒精性脂肪性肝病的牙周治疗和微生物组靶向治疗。

World J Gastroenterol. 2023 Feb 14;29(6):967-996. doi: 10.3748/wjg.v29.i6.967.

本文引用的文献

Novel ratio-metric features enable the identification of new driver genes across cancer types.新型比率度量特征可识别跨癌症类型的新驱动基因。

Sci Rep. 2022 Jan 7;12(1):5. doi: 10.1038/s41598-021-04015-y.

Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes.序列邻域有助于可靠预测癌症基因组中的致病突变。

Cancers (Basel). 2021 May 14;13(10):2366. doi: 10.3390/cancers13102366.

The nf-core framework for community-curated bioinformatics pipelines.用于社区策划生物信息学流程的nf-core框架。

Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x.

Analysis of RNA Sequencing Data Using CLC Genomics Workbench.使用CLC基因组学工作台分析RNA测序数据。

Methods Mol Biol. 2020;2102:61-113. doi: 10.1007/978-1-0716-0223-2_4.

GenPipes: an open-source framework for distributed and scalable genomic analyses.GenPipes：一个用于分布式和可扩展基因组分析的开源框架。

Gigascience. 2019 Jun 1;8(6). doi: 10.1093/gigascience/giz037.

snakePipes: facilitating flexible, scalable and integrative epigenomic analysis.snakePipes：实现灵活、可扩展和集成的表观基因组分析。

Bioinformatics. 2019 Nov 1;35(22):4757-4759. doi: 10.1093/bioinformatics/btz436.

Next-generation sequencing and its clinical application.下一代测序技术及其临床应用。

Cancer Biol Med. 2019 Feb;16(1):4-10. doi: 10.20892/j.issn.2095-3941.2018.0055.

ARMOR: An utomated eproducible dular Workflow for Preprocessing and Differential Analysis of NA-seq Data.ARMOR：一种用于预处理和差异分析 NA-seq 数据的自动化可重复模块化工作流程。

G3 (Bethesda). 2019 Jul 9;9(7):2089-2096. doi: 10.1534/g3.119.400185. Print 2019 Jul 1.

An open resource for accurately benchmarking small variant and reference calls.用于准确基准测试小型变体和参考调用的开放资源。

Nat Biotechnol. 2019 May;37(5):561-566. doi: 10.1038/s41587-019-0074-6. Epub 2019 Apr 1.

Best practices for benchmarking germline small-variant calls in human genomes.人类基因组中小变异calls 的基准测试最佳实践。

Nat Biotechnol. 2019 May;37(5):555-560. doi: 10.1038/s41587-019-0054-x. Epub 2019 Mar 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

iCOMIC：一个由图形界面驱动的用于分析癌症组学数据的生物信息学流程。

iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献