• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

《癌症发现》:一种用于从高通量测序数据预测癌症生物标志物和癌症类型的综合流程。

CancerDiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data.

作者信息

Mohammed Akram, Biegert Greyson, Adamec Jiri, Helikar Tomáš

机构信息

Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America.

出版信息

Oncotarget. 2017 Dec 20;9(2):2565-2573. doi: 10.18632/oncotarget.23511. eCollection 2018 Jan 5.

DOI:10.18632/oncotarget.23511
PMID:29416792
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5788660/
Abstract

Accurate identification of cancer biomarkers and classification of cancer type and subtype from High Throughput Sequencing (HTS) data is a challenging problem because it requires manual processing of raw HTS data from various sequencing platforms, quality control, and normalization, which are both tedious and time-consuming. Machine learning techniques for cancer class prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. To date, great research efforts have been taken for cancer biomarker identification and cancer class prediction. However, currently available tools and pipelines lack flexibility in data preprocessing, running multiple feature selection methods and learning algorithms, therefore, developing a freely available and easy-to-use program is strongly demanded by researchers. Here, we propose CancerDiscover, an integrative open-source software pipeline that allows users to automatically and efficiently process large high-throughput raw datasets, normalize, and selects best performing features from multiple feature selection algorithms. Additionally, the integrative pipeline lets users apply different feature thresholds to identify cancer biomarkers and build various training models to distinguish different types and subtypes of cancer. The open-source software is available at https://github.com/HelikarLab/CancerDiscover and is free for use under the GPL3 license.

摘要

从高通量测序(HTS)数据中准确识别癌症生物标志物以及对癌症类型和亚型进行分类是一个具有挑战性的问题,因为这需要对来自各种测序平台的原始HTS数据进行人工处理、质量控制和标准化,这些工作既繁琐又耗时。用于癌症类别预测和生物标志物发现的机器学习技术可以加快癌症检测并显著改善预后。迄今为止,在癌症生物标志物识别和癌症类别预测方面已经进行了大量的研究工作。然而,目前可用的工具和流程在数据预处理、运行多种特征选择方法和学习算法方面缺乏灵活性,因此,研究人员强烈需要开发一个免费且易于使用的程序。在此,我们提出了CancerDiscover,这是一个集成的开源软件流程,允许用户自动高效地处理大型高通量原始数据集,进行标准化,并从多种特征选择算法中选择性能最佳的特征。此外,该集成流程允许用户应用不同的特征阈值来识别癌症生物标志物,并构建各种训练模型以区分不同类型和亚型的癌症。该开源软件可在https://github.com/HelikarLab/CancerDiscover获取,并且在GPL3许可下可免费使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d84a/5788660/2327ff365af8/oncotarget-09-2565-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d84a/5788660/0b597e065678/oncotarget-09-2565-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d84a/5788660/2327ff365af8/oncotarget-09-2565-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d84a/5788660/0b597e065678/oncotarget-09-2565-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d84a/5788660/2327ff365af8/oncotarget-09-2565-g002.jpg

相似文献

1
CancerDiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data.《癌症发现》:一种用于从高通量测序数据预测癌症生物标志物和癌症类型的综合流程。
Oncotarget. 2017 Dec 20;9(2):2565-2573. doi: 10.18632/oncotarget.23511. eCollection 2018 Jan 5.
2
CBioProfiler: A Web and Standalone Pipeline for Cancer Biomarker and Subtype Characterization.CBioProfiler:用于癌症生物标志物和亚型特征分析的网络和独立管道。
Genomics Proteomics Bioinformatics. 2024 Sep 13;22(3). doi: 10.1093/gpbjnl/qzae045.
3
CAncer bioMarker Prediction Pipeline (CAMPP)-A standardized framework for the analysis of quantitative biological data.癌症生物标志物预测管道 (CAMPP)-用于分析定量生物学数据的标准化框架。
PLoS Comput Biol. 2020 Mar 16;16(3):e1007665. doi: 10.1371/journal.pcbi.1007665. eCollection 2020 Mar.
4
NGS-QCbox and Raspberry for Parallel, Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data.用于大规模新一代测序(Illumina)数据并行、自动化和快速质量控制分析的NGS-QCbox与树莓派
PLoS One. 2015 Oct 13;10(10):e0139868. doi: 10.1371/journal.pone.0139868. eCollection 2015.
5
Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data.用于高维组学数据中生物标志物发现的大规模自动特征选择
Front Genet. 2019 May 16;10:452. doi: 10.3389/fgene.2019.00452. eCollection 2019.
6
Complementary feature selection from alternative splicing events and gene expression for phenotype prediction.用于表型预测的可变剪接事件和基因表达的互补特征选择。
Bioinformatics. 2016 Sep 1;32(17):i421-i429. doi: 10.1093/bioinformatics/btw430.
7
Lazy Resampling: Fast and information preserving preprocessing for deep learning.懒惰重采样:深度学习的快速且信息保持预处理方法。
Comput Methods Programs Biomed. 2024 Dec;257:108422. doi: 10.1016/j.cmpb.2024.108422. Epub 2024 Sep 19.
8
Pan-cancer integrative analysis of whole-genome De novo somatic point mutations reveals 17 cancer types.全基因组从头体细胞点突变的泛癌症综合分析揭示了 17 种癌症类型。
BMC Bioinformatics. 2022 Jul 25;23(1):298. doi: 10.1186/s12859-022-04840-6.
9
POAP: A GNU parallel based multithreaded pipeline of open babel and AutoDock suite for boosted high throughput virtual screening.POAP:基于 GNU 并行的多线程 Open Babel 和 AutoDock 套件流水线,用于增强高通量虚拟筛选。
Comput Biol Chem. 2018 Jun;74:39-48. doi: 10.1016/j.compbiolchem.2018.02.012. Epub 2018 Mar 1.
10
FeatureSelect: a software for feature selection based on machine learning approaches.FeatureSelect:一款基于机器学习方法的特征选择软件。
BMC Bioinformatics. 2019 Apr 3;20(1):170. doi: 10.1186/s12859-019-2754-0.

引用本文的文献

1
Blood miRNAs miR-549a, miR-552, and miR-592 serve as potential disease-specific panels to diagnose colorectal cancer.血液中的微小RNA miR-549a、miR-552和miR-592可作为诊断结直肠癌的潜在疾病特异性指标。
Heliyon. 2024 Mar 24;10(7):e28492. doi: 10.1016/j.heliyon.2024.e28492. eCollection 2024 Apr 15.
2
Recent applications of quantitative systems pharmacology and machine learning models across diseases.近年来定量系统药理学和机器学习模型在多种疾病中的应用。
J Pharmacokinet Pharmacodyn. 2022 Feb;49(1):19-37. doi: 10.1007/s10928-021-09790-9. Epub 2021 Oct 20.
3
A machine learning-based gene signature of response to the novel alkylating agent LP-184 distinguishes its potential tumor indications.

本文引用的文献

1
Identification of potential tissue-specific cancer biomarkers and development of cancer versus normal genomic classifiers.潜在组织特异性癌症生物标志物的鉴定以及癌症与正常基因组分类器的开发。
Oncotarget. 2017 Sep 21;8(49):85692-85715. doi: 10.18632/oncotarget.21127. eCollection 2017 Oct 17.
2
A PanorOmic view of personal cancer genomes.全景式个人癌症基因组分析
Nucleic Acids Res. 2017 Jul 3;45(W1):W195-W200. doi: 10.1093/nar/gkx311.
3
GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses.
一种基于机器学习的对新型烷化剂LP - 184反应的基因特征可区分其潜在的肿瘤适应症。
BMC Bioinformatics. 2021 Mar 2;22(1):102. doi: 10.1186/s12859-021-04040-8.
4
Differential gene expression analysis reveals novel genes and pathways in pediatric septic shock patients.差异基因表达分析揭示了小儿感染性休克患者中的新基因和新途径。
Sci Rep. 2019 Aug 2;9(1):11270. doi: 10.1038/s41598-019-47703-6.
GEPIA:一个用于癌症和正常基因表达谱分析及交互式分析的网络服务器。
Nucleic Acids Res. 2017 Jul 3;45(W1):W98-W102. doi: 10.1093/nar/gkx247.
4
INDEED: Integrated differential expression and differential network analysis of omic data for biomarker discovery.确实:用于生物标志物发现的组学数据的综合差异表达和差异网络分析。
Methods. 2016 Dec 1;111:12-20. doi: 10.1016/j.ymeth.2016.08.015. Epub 2016 Aug 31.
5
Application of a hierarchical enzyme classification method reveals the role of gut microbiome in human metabolism.一种分层酶分类方法的应用揭示了肠道微生物群在人体新陈代谢中的作用。
BMC Genomics. 2015;16 Suppl 7(Suppl 7):S16. doi: 10.1186/1471-2164-16-S7-S16. Epub 2015 Jun 11.
6
Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal.利用 cBioPortal 进行复杂癌症基因组学和临床特征的综合分析
Sci Signal. 2013 Apr 2;6(269):pl1. doi: 10.1126/scisignal.2004088.
7
GeneSigDB: a manually curated database and resource for analysis of gene expression signatures.GeneSigDB:一个手动整理的数据库和资源,用于分析基因表达特征。
Nucleic Acids Res. 2012 Jan;40(Database issue):D1060-6. doi: 10.1093/nar/gkr901. Epub 2011 Nov 21.
8
Chipster: user-friendly analysis software for microarray and other high-throughput data.Chipster:一款用户友好的微阵列和其他高通量数据分析软件。
BMC Genomics. 2011 Oct 14;12:507. doi: 10.1186/1471-2164-12-507.
9
Data mining using the Catalogue of Somatic Mutations in Cancer BioMart.利用癌症生物信息学数据库中的体细胞突变目录进行数据挖掘。
Database (Oxford). 2011 May 23;2011:bar018. doi: 10.1093/database/bar018. Print 2011.
10
Tools for managing and analyzing microarray data.用于管理和分析微阵列数据的工具。
Brief Bioinform. 2012 Jan;13(1):46-60. doi: 10.1093/bib/bbr010. Epub 2011 Mar 21.