结合癌症的DNA甲基化和RNA测序数据进行监督式知识提取。

Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction.

作者信息

Cappelli Eleonora, Felici Giovanni, Weitschek Emanuel

机构信息

1Department of Engineering, Roma Tre University, Via della Vasca Navale, 70, Rome, 00146 Italy.

2Institute of Systems Analysis and Computer Science, National Research Council, Via dei Taurini, 19, Rome, 00185 Italy.

出版信息

BioData Min. 2018 Oct 25;11:22. doi: 10.1186/s13040-018-0184-6. eCollection 2018.

DOI:10.1186/s13040-018-0184-6

PMID:30386434

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6203208/

Abstract

BACKGROUND

In the Next Generation Sequencing (NGS) era a large amount of biological data is being sequenced, analyzed, and stored in many public databases, whose interoperability is often required to allow an enhanced accessibility. The combination of heterogeneous NGS genomic data is an open challenge: the analysis of data from different experiments is a fundamental practice for the study of diseases. In this work, we propose to combine DNA methylation and RNA sequencing NGS experiments at gene level for supervised knowledge extraction in cancer.

METHODS

We retrieve DNA methylation and RNA sequencing datasets from The Cancer Genome Atlas (TCGA), focusing on the Breast Invasive Carcinoma (BRCA), the Thyroid Carcinoma (THCA), and the Kidney Renal Papillary Cell Carcinoma (KIRP). We combine the RNA sequencing gene expression values with the gene methylation quantity, as a new measure that we define for representing the methylation quantity associated to a gene. Additionally, we propose to analyze the combined data through tree- and rule-based classification algorithms (C4.5, Random Forest, RIPPER, and CAMUR).

RESULTS

We extract more than 15,000 classification models (composed of gene sets), which allow to distinguish the tumoral samples from the normal ones with an average accuracy of 95%. From the integrated experiments we obtain about 5000 classification models that consider both the gene measures related to the RNA sequencing and the DNA methylation experiments.

CONCLUSIONS

We compare the sets of genes obtained from the classifications on RNA sequencing and DNA methylation data with the genes obtained from the integration of the two experiments. The comparison results in several genes that are in common among the single experiments and the integrated ones (733 for BRCA, 35 for KIRP, and 861 for THCA) and 509 genes that are in common among the different experiments. Finally, we investigate the possible relationships among the different analyzed tumors by extracting a core set of 13 genes that appear in all tumors. A preliminary functional analysis confirms the relation of part of those genes (5 out of 13 and 279 out of 509) with cancer, suggesting to focus further studies on the new individuated ones.

摘要

背景

在下一代测序（NGS）时代，大量生物数据正在被测序、分析并存储在许多公共数据库中，这些数据库的互操作性对于增强数据可访问性往往是必需的。异构NGS基因组数据的组合是一个开放性挑战：分析来自不同实验的数据是疾病研究的一项基本实践。在这项工作中，我们提议在基因水平上结合DNA甲基化和RNA测序NGS实验，以在癌症中进行有监督的知识提取。

方法

我们从癌症基因组图谱（TCGA）中检索DNA甲基化和RNA测序数据集，重点关注乳腺浸润性癌（BRCA）、甲状腺癌（THCA）和肾肾乳头状细胞癌（KIRP）。我们将RNA测序基因表达值与基因甲基化量相结合，作为我们定义的一种新度量，用于表示与一个基因相关的甲基化量。此外，我们提议通过基于树和规则的分类算法（C4.5、随机森林、RIPPER和CAMUR）来分析组合数据。

结果

我们提取了超过15000个分类模型（由基因集组成），这些模型能够以95%的平均准确率区分肿瘤样本和正常样本。从综合实验中，我们获得了约5000个分类模型，这些模型同时考虑了与RNA测序和DNA甲基化实验相关的基因度量。

结论

我们将从RNA测序和DNA甲基化数据分类中获得的基因集与从两个实验整合中获得的基因进行比较。比较结果显示，在单个实验和整合实验中有几个共同的基因（BRCA有733个，KIRP有35个，THCA有861个），以及在不同实验中有509个共同的基因。最后，我们通过提取出现在所有肿瘤中的13个基因的核心集来研究不同分析肿瘤之间可能的关系。初步功能分析证实了其中部分基因（13个中的5个以及509个中的279个）与癌症的关系，这表明应进一步关注新发现的基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e6f/6203208/481cfbf19108/13040_2018_184_Fig1_HTML.jpg

相似文献

Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction.

BioData Min. 2018 Oct 25;11:22. doi: 10.1186/s13040-018-0184-6. eCollection 2018.

CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules.

Bioinformatics. 2016 Mar 1;32(5):697-704. doi: 10.1093/bioinformatics/btv635. Epub 2015 Oct 30.

Prognostic risk signature based on the expression of three m6A RNA methylation regulatory genes in kidney renal papillary cell carcinoma.

Aging (Albany NY). 2020 Nov 7;12(21):22078-22094. doi: 10.18632/aging.104053.

CamurWeb: a classification software and a large knowledge base for gene expression data of cancer.

BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):354. doi: 10.1186/s12859-018-2299-7.

Integrative analysis of DNA methylation and gene expression in papillary renal cell carcinoma.

Mol Genet Genomics. 2020 May;295(3):807-824. doi: 10.1007/s00438-020-01664-y. Epub 2020 Mar 17.

TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas.

BMC Bioinformatics. 2017 Jan 3;18(1):6. doi: 10.1186/s12859-016-1419-5.

PreMSIm: An R package for predicting microsatellite instability from the expression profiling of a gene panel in cancer.

Comput Struct Biotechnol J. 2020 Mar 19;18:668-675. doi: 10.1016/j.csbj.2020.03.007. eCollection 2020.

A hybrid metaheuristic-deep learning technique for the pan-classification of cancer based on DNA methylation.

BMC Bioinformatics. 2022 Jul 11;23(1):273. doi: 10.1186/s12859-022-04815-7.

Good or not good: Role of miR-18a in cancer biology.

Rep Pract Oncol Radiother. 2020 Sep-Oct;25(5):808-819. doi: 10.1016/j.rpor.2020.07.006. Epub 2020 Aug 12.

The functions and prognostic values of m6A RNA methylation regulators in thyroid carcinoma.

Cancer Cell Int. 2021 Jul 19;21(1):385. doi: 10.1186/s12935-021-02090-9.

引用本文的文献

Classification of tumor types using XGBoost machine learning model: a vector space transformation of genomic alterations.

J Transl Med. 2023 Nov 21;21(1):836. doi: 10.1186/s12967-023-04720-4.

Ten quick tips for avoiding pitfalls in multi-omics data integration analyses.

PLoS Comput Biol. 2023 Jul 6;19(7):e1011224. doi: 10.1371/journal.pcbi.1011224. eCollection 2023 Jul.

Effects and Mechanism of Particulate Matter on Tendon Healing Based on Integrated Analysis of DNA Methylation and RNA Sequencing Data in a Rat Model.

Int J Mol Sci. 2022 Jul 25;23(15):8170. doi: 10.3390/ijms23158170.

Widespread redundancy in -omics profiles of cancer mutation states.

Genome Biol. 2022 Jun 27;23(1):137. doi: 10.1186/s13059-022-02705-y.

Modelling the bioinformatics tertiary analysis research process.

BMC Bioinformatics. 2021 Sep 30;22(Suppl 13):452. doi: 10.1186/s12859-021-04310-5.

Machine learning analysis of TCGA cancer data.

PeerJ Comput Sci. 2021 Jul 12;7:e584. doi: 10.7717/peerj-cs.584. eCollection 2021.

Diagnostic Utility of Genome-Wide DNA Methylation Analysis in Mendelian Neurodevelopmental Disorders.

Int J Mol Sci. 2020 Dec 6;21(23):9303. doi: 10.3390/ijms21239303.

Enhanced CXCR4 Expression Associates with Increased Gene Body 5-Hydroxymethylcytosine Modification but not Decreased Promoter Methylation in Colorectal Cancer.

Cancers (Basel). 2020 Feb 26;12(3):539. doi: 10.3390/cancers12030539.

Knowledge Generation with Rule Induction in Cancer Omics.

Int J Mol Sci. 2019 Dec 18;21(1):18. doi: 10.3390/ijms21010018.

Within-sample co-methylation patterns in normal tissues.

BioData Min. 2019 May 9;12:9. doi: 10.1186/s13040-019-0198-8. eCollection 2019.

本文引用的文献

Integrative analysis of gene expression and methylation data for breast cancer cell lines.

BioData Min. 2018 Jun 25;11:13. doi: 10.1186/s13040-018-0174-8. eCollection 2018.

DIRECTION: a machine learning framework for predicting and characterizing DNA methylation and hydroxymethylation in mammalian genomes.

Bioinformatics. 2017 Oct 1;33(19):2986-2994. doi: 10.1093/bioinformatics/btx316.

Quantitative and correlation analysis of the DNA methylation and expression of DAPK in breast cancer.

PeerJ. 2017 Mar 14;5:e3084. doi: 10.7717/peerj.3084. eCollection 2017.

A novel method for splice sites prediction using sequence component and hidden Markov model.

Annu Int Conf IEEE Eng Med Biol Soc. 2016 Aug;2016:3076-3079. doi: 10.1109/EMBC.2016.7591379.

Next-Generation Sequencing of Circulating Tumor DNA for Early Cancer Detection.

Cell. 2017 Feb 9;168(4):571-574. doi: 10.1016/j.cell.2017.01.030.

Dermatologist-level classification of skin cancer with deep neural networks.

Nature. 2017 Feb 2;542(7639):115-118. doi: 10.1038/nature21056. Epub 2017 Jan 25.

Identifying Stages of Kidney Renal Cell Carcinoma by Combining Gene Expression and DNA Methylation Data.

IEEE/ACM Trans Comput Biol Bioinform. 2017 Sep-Oct;14(5):1147-1153. doi: 10.1109/TCBB.2016.2607717. Epub 2016 Sep 9.

Identification of Genetic and Epigenetic Variants Associated with Breast Cancer Prognosis by Integrative Bioinformatics Analysis.

Cancer Inform. 2017 Jan 9;16:1-13. doi: 10.4137/CIN.S39783. eCollection 2017.

TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas.

BMC Bioinformatics. 2017 Jan 3;18(1):6. doi: 10.1186/s12859-016-1419-5.

Exploring the intrinsic differences among breast tumor subtypes defined using immunohistochemistry markers based on the decision tree.

Sci Rep. 2016 Oct 27;6:35773. doi: 10.1038/srep35773.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

结合癌症的DNA甲基化和RNA测序数据进行监督式知识提取。

Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献