Suppr
超能文献

整合RNA测序数据与异质性微阵列数据用于乳腺癌分析。

Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling.

作者信息

Castillo Daniel, Gálvez Juan Manuel, Herrera Luis Javier, Román Belén San, Rojas Fernando, Rojas Ignacio

机构信息

Department of Computer Architecture and Technology, University of Granada, Periodista Rafael Gómez Montero, 2, Granada, 18014, Spain.

出版信息

BMC Bioinformatics. 2017 Nov 21;18(1):506. doi: 10.1186/s12859-017-1925-0.

DOI:10.1186/s12859-017-1925-0

PMID:29157215

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5697344/

Abstract

BACKGROUND

Nowadays, many public repositories containing large microarray gene expression datasets are available. However, the problem lies in the fact that microarray technology are less powerful and accurate than more recent Next Generation Sequencing technologies, such as RNA-Seq. In any case, information from microarrays is truthful and robust, thus it can be exploited through the integration of microarray data with RNA-Seq data. Additionally, information extraction and acquisition of large number of samples in RNA-Seq still entails very high costs in terms of time and computational resources.This paper proposes a new model to find the gene signature of breast cancer cell lines through the integration of heterogeneous data from different breast cancer datasets, obtained from microarray and RNA-Seq technologies. Consequently, data integration is expected to provide a more robust statistical significance to the results obtained. Finally, a classification method is proposed in order to test the robustness of the Differentially Expressed Genes when unseen data is presented for diagnosis.

RESULTS

The proposed data integration allows analyzing gene expression samples coming from different technologies. The most significant genes of the whole integrated data were obtained through the intersection of the three gene sets, corresponding to the identified expressed genes within the microarray data itself, within the RNA-Seq data itself, and within the integrated data from both technologies. This intersection reveals 98 possible technology-independent biomarkers. Two different heterogeneous datasets were distinguished for the classification tasks: a training dataset for gene expression identification and classifier validation, and a test dataset with unseen data for testing the classifier. Both of them achieved great classification accuracies, therefore confirming the validity of the obtained set of genes as possible biomarkers for breast cancer. Through a feature selection process, a final small subset made up by six genes was considered for breast cancer diagnosis.

CONCLUSIONS

This work proposes a novel data integration stage in the traditional gene expression analysis pipeline through the combination of heterogeneous data from microarrays and RNA-Seq technologies. Available samples have been successfully classified using a subset of six genes obtained by a feature selection method. Consequently, a new classification and diagnosis tool was built and its performance was validated using previously unseen samples.

摘要

背景

如今，有许多包含大型微阵列基因表达数据集的公共存储库。然而，问题在于微阵列技术不如更新的下一代测序技术（如RNA测序）强大和准确。无论如何，微阵列提供的信息是真实可靠的，因此可以通过将微阵列数据与RNA测序数据整合来加以利用。此外，在RNA测序中，提取和获取大量样本的信息在时间和计算资源方面仍然需要很高的成本。本文提出了一种新模型，通过整合来自不同乳腺癌数据集的异构数据（这些数据来自微阵列和RNA测序技术）来寻找乳腺癌细胞系的基因特征。因此，数据整合有望为所得结果提供更强的统计显著性。最后，提出了一种分类方法，以测试在呈现未见数据用于诊断时差异表达基因的稳健性。

结果

所提出的数据整合允许分析来自不同技术的基因表达样本。通过三个基因集的交集获得了整个整合数据中最显著的基因，这三个基因集分别对应于微阵列数据本身、RNA测序数据本身以及两种技术的整合数据中确定的表达基因。这个交集揭示了98个可能与技术无关的生物标志物。为分类任务区分了两个不同的异构数据集：一个用于基因表达识别和分类器验证的训练数据集，以及一个用于测试分类器的包含未见数据的测试数据集。它们都取得了很高的分类准确率，从而证实了所获得的基因集作为乳腺癌可能生物标志物的有效性。通过特征选择过程，最终考虑了由六个基因组成的小子集用于乳腺癌诊断。

结论

这项工作通过结合微阵列和RNA测序技术的异构数据，在传统基因表达分析流程中提出了一个新的数据整合阶段。使用通过特征选择方法获得的六个基因的子集成功地对可用样本进行了分类。因此，构建了一种新的分类和诊断工具，并使用先前未见的样本验证了其性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c825/5697344/5f90f4e4ecbb/12859_2017_1925_Fig1_HTML.jpg

相似文献

Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling.

BMC Bioinformatics. 2017 Nov 21;18(1):506. doi: 10.1186/s12859-017-1925-0.

Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series.

PLoS One. 2018 May 11;13(5):e0196836. doi: 10.1371/journal.pone.0196836. eCollection 2018.

RNA-Seq vs dual- and single-channel microarray data: sensitivity analysis for differential expression and clustering.

PLoS One. 2012;7(12):e50986. doi: 10.1371/journal.pone.0050986. Epub 2012 Dec 10.

Using microarray-based subtyping methods for breast cancer in the era of high-throughput RNA sequencing.

Mol Oncol. 2018 Dec;12(12):2136-2146. doi: 10.1002/1878-0261.12389. Epub 2018 Oct 29.

Mixture classification model based on clinical markers for breast cancer prognosis.

Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.

Parallel comparison of Illumina RNA-Seq and Affymetrix microarray platforms on transcriptomic profiles generated from 5-aza-deoxy-cytidine treated HT-29 colon cancer cells and simulated datasets.

BMC Bioinformatics. 2013;14 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2105-14-S9-S1. Epub 2013 Jun 28.

Improving reliability and absolute quantification of human brain microarray data by filtering and scaling probes using RNA-Seq.

BMC Genomics. 2014 Feb 24;15(1):154. doi: 10.1186/1471-2164-15-154.

Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes.

BMC Bioinformatics. 2005 Nov 4;6:265. doi: 10.1186/1471-2105-6-265.

Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data.

Bioinformatics. 2018 Jun 1;34(11):1868-1874. doi: 10.1093/bioinformatics/bty026.

Classification across gene expression microarray studies.

BMC Bioinformatics. 2009 Dec 30;10:453. doi: 10.1186/1471-2105-10-453.

引用本文的文献

A state-of-the-art review of diffusion model applications for microscopic image and micro-alike image analysis.

Front Med (Lausanne). 2025 Jul 16;12:1551894. doi: 10.3389/fmed.2025.1551894. eCollection 2025.

Network-Based Integrative Analysis to Identify Key Genes and Corresponding Reporter Biomolecules for Triple-Negative Breast Cancer.

Cancer Med. 2025 May;14(9):e70674. doi: 10.1002/cam4.70674.

DEG (differentially expressed gene) or not DEG that is the question: Should we compare between datasets or not?

J Mol Cell Cardiol Plus. 2022 Dec 23;3:100029. doi: 10.1016/j.jmccpl.2022.100029. eCollection 2023 Mar.

EMBER creates a unified space for independent breast cancer transcriptomic datasets enabling precision oncology.

NPJ Breast Cancer. 2024 Jul 9;10(1):56. doi: 10.1038/s41523-024-00665-z.

Harmonizing heterogeneous transcriptomics datasets for machine learning-based analysis to identify spaceflown murine liver-specific changes.

NPJ Microgravity. 2024 Jun 11;10(1):61. doi: 10.1038/s41526-024-00379-3.

Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models.

Nat Biomed Eng. 2025 Mar;9(3):320-332. doi: 10.1038/s41551-024-01193-8. Epub 2024 Mar 21.

A Role of Multi-Omics Technologies in Sheep and Goat Meats: Progress and Way Ahead.

Foods. 2023 Nov 9;12(22):4069. doi: 10.3390/foods12224069.

Multi-Omics Approaches to Improve Meat Quality and Taste Characteristics.

Food Sci Anim Resour. 2023 Nov;43(6):1067-1086. doi: 10.5851/kosfa.2023.e63. Epub 2023 Nov 1.

Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.

Med Biol Eng Comput. 2023 Nov;61(11):2895-2919. doi: 10.1007/s11517-023-02892-1. Epub 2023 Aug 2.

Cross-Platform Transcriptomic Data Integration, Profiling, and Mining in .

Microbiol Spectr. 2023 Jun 15;11(3):e0536922. doi: 10.1128/spectrum.05369-22. Epub 2023 May 16.

本文引用的文献

Effects of exercise training on circulating levels of Dickkpof-1 and secreted frizzled-related protein-1 in breast cancer survivors: A pilot single-blind randomized controlled trial.

PLoS One. 2017 Feb 8;12(2):e0171771. doi: 10.1371/journal.pone.0171771. eCollection 2017.

In vivo and in vitro effects of microRNA-27a on proliferation, migration and invasion of breast cancer cells through targeting of SFRP1 gene via Wnt/β-catenin signaling pathway.

Oncotarget. 2017 Feb 28;8(9):15507-15519. doi: 10.18632/oncotarget.14662.

The Distinct Gene Regulatory Network of Myoglobin in Prostate and Breast Cancer.

PLoS One. 2015 Nov 11;10(11):e0142662. doi: 10.1371/journal.pone.0142662. eCollection 2015.

HTSeq--a Python framework to work with high-throughput sequencing data.

Bioinformatics. 2015 Jan 15;31(2):166-9. doi: 10.1093/bioinformatics/btu638. Epub 2014 Sep 25.

TRIM29 suppresses TWIST1 and invasive breast cancer behavior.

Cancer Res. 2014 Sep 1;74(17):4875-87. doi: 10.1158/0008-5472.CAN-13-3579. Epub 2014 Jun 20.

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor.

Nat Protoc. 2013 Sep;8(9):1765-86. doi: 10.1038/nprot.2013.099. Epub 2013 Aug 22.

TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.

Genome Biol. 2013 Apr 25;14(4):R36. doi: 10.1186/gb-2013-14-4-r36.

A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae.

Nucleic Acids Res. 2012 Nov 1;40(20):10084-97. doi: 10.1093/nar/gks804. Epub 2012 Sep 10.

Fast gapped-read alignment with Bowtie 2.

Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.

Effect of estrogen sulfation by SULT1E1 and PAPSS on the development of estrogen-dependent cancers.

Cancer Sci. 2012 Jun;103(6):1000-9. doi: 10.1111/j.1349-7006.2012.02258.x. Epub 2012 Apr 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

整合RNA测序数据与异质性微阵列数据用于乳腺癌分析。

Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译