跨平台归一化可实现微阵列和 RNA-seq 数据上的机器学习模型训练。

Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously.

机构信息

Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Wynnewood, PA, USA.

出版信息

Commun Biol. 2023 Feb 25;6(1):222. doi: 10.1038/s42003-023-04588-6.

DOI:10.1038/s42003-023-04588-6

PMID:36841852

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9968332/

Abstract

Large compendia of gene expression data have proven valuable for the discovery of novel biological relationships. Historically, most available RNA assays were run on microarray, while RNA-seq is now the platform of choice for many new experiments. The data structure and distributions between the platforms differ, making it challenging to combine them directly. Here we perform supervised and unsupervised machine learning evaluations to assess which existing normalization methods are best suited for combining microarray and RNA-seq data. We find that quantile and Training Distribution Matching normalization allow for supervised and unsupervised model training on microarray and RNA-seq data simultaneously. Nonparanormal normalization and z-scores are also appropriate for some applications, including pathway analysis with Pathway-Level Information Extractor (PLIER). We demonstrate that it is possible to perform effective cross-platform normalization using existing methods to combine microarray and RNA-seq data for machine learning applications.

摘要

大量的基因表达数据已经被证明对于发现新的生物学关系非常有价值。从历史上看，大多数可用的 RNA 检测是在微阵列上进行的，而 RNA-seq 现在是许多新实验的首选平台。这两个平台的数据结构和分布不同，使得直接组合它们具有挑战性。在这里，我们进行了有监督和无监督的机器学习评估，以评估哪些现有的标准化方法最适合组合微阵列和 RNA-seq 数据。我们发现，分位数和训练分布匹配标准化允许在微阵列和 RNA-seq 数据上同时进行有监督和无监督的模型训练。非参数正态标准化和 z 分数在某些应用中也很合适，包括使用途径级信息提取器（Pathway-Level Information Extractor，PLIER）进行途径分析。我们证明，使用现有的方法进行有效的跨平台标准化是可能的，以便将微阵列和 RNA-seq 数据用于机器学习应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31a1/9968332/2a229a8efc1a/42003_2023_4588_Fig1_HTML.jpg

相似文献

Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously.

Commun Biol. 2023 Feb 25;6(1):222. doi: 10.1038/s42003-023-04588-6.

Cross-platform normalization of microarray and RNA-seq data for machine learning applications.

PeerJ. 2016 Jan 21;4:e1621. doi: 10.7717/peerj.1621. eCollection 2016.

Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data.

Bioinformatics. 2018 Jun 1;34(11):1868-1874. doi: 10.1093/bioinformatics/bty026.

Biological classification with RNA-seq data: Can alternatively spliced transcript expression enhance machine learning classifiers?

RNA. 2018 Sep;24(9):1119-1132. doi: 10.1261/rna.062802.117. Epub 2018 Jun 25.

A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data.

Bioinformatics. 2022 Oct 31;38(21):4885-4892. doi: 10.1093/bioinformatics/btac617.

Feature-specific quantile normalization and feature-specific mean-variance normalization deliver robust bi-directional classification and feature selection performance between microarray and RNAseq data.

BMC Bioinformatics. 2024 Mar 29;25(1):136. doi: 10.1186/s12859-024-05759-w.

Parallel comparison of Illumina RNA-Seq and Affymetrix microarray platforms on transcriptomic profiles generated from 5-aza-deoxy-cytidine treated HT-29 colon cancer cells and simulated datasets.

BMC Bioinformatics. 2013;14 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2105-14-S9-S1. Epub 2013 Jun 28.

Identification of transcriptional subtypes in lung adenocarcinoma and squamous cell carcinoma through integrative analysis of microarray and RNA sequencing data.

Sci Rep. 2021 Apr 22;11(1):8709. doi: 10.1038/s41598-021-88209-4.

Seq-ing improved gene expression estimates from microarrays using machine learning.

BMC Bioinformatics. 2015 Sep 4;16:286. doi: 10.1186/s12859-015-0712-z.

Analysis of Microarray and RNA-seq Expression Profiling Data.

Cold Spring Harb Protoc. 2017 Mar 1;2017(3):pdb.top093104. doi: 10.1101/pdb.top093104.

引用本文的文献

Machine learning-based identification of diagnostic and prognostic mitotic cell cycle genes in hepatocellular carcinoma.

PLoS One. 2025 Aug 28;20(8):e0331118. doi: 10.1371/journal.pone.0331118. eCollection 2025.

Normalization and Selecting Non-Differentially Expressed Genes Improve Machine Learning Modelling of Cross-Platform Transcriptomic Data.

Trans Artif Intell. 2025;1(1). doi: 10.53941/tai.2025.100005. Epub 2025 May 25.

Machine Learning Approach and Bioinformatics Analysis Discovered Key Genomic Signatures for Hepatitis B Virus-Associated Hepatocyte Remodeling and Hepatocellular Carcinoma.

Cancer Inform. 2025 Apr 16;24:11769351251333847. doi: 10.1177/11769351251333847. eCollection 2025.

Association of normalization, non-differentially expressed genes and data source with machine learning performance in intra-dataset or cross-dataset modelling of transcriptomic and clinical data.

ArXiv. 2025 Feb 27:arXiv:2502.18888v2.

Normalization and selecting non-differentially expressed genes improve machine learning modelling of cross-platform transcriptomic data.

ArXiv. 2025 Jan 24:arXiv:2501.14248v1.

Robust Cluster Prediction Across Data Types Validates Association of Sex and Therapy Response in GBM.

Cancers (Basel). 2025 Jan 28;17(3):445. doi: 10.3390/cancers17030445.

Predicting the Progression from Asymptomatic to Symptomatic Multiple Myeloma and Stage Classification Using Gene Expression Data.

Cancers (Basel). 2025 Jan 20;17(2):332. doi: 10.3390/cancers17020332.

Transcriptomic analysis of human cartilage identified potential therapeutic targets for hip osteoarthritis.

Hum Mol Genet. 2025 Feb 17;34(5):444-453. doi: 10.1093/hmg/ddae200.

Investigating heart rate variability measures during pregnancy as predictors of postpartum depression and anxiety: an exploratory study.

Transl Psychiatry. 2024 May 14;14(1):203. doi: 10.1038/s41398-024-02909-9.

BMC Bioinformatics. 2024 Mar 29;25(1):136. doi: 10.1186/s12859-024-05759-w.

本文引用的文献

Widespread redundancy in -omics profiles of cancer mutation states.

Genome Biol. 2022 Jun 27;23(1):137. doi: 10.1186/s13059-022-02705-y.

meGPS: a multi-omics signature for hepatocellular carcinoma detection integrating methylome and transcriptome data.

Bioinformatics. 2022 Jul 11;38(14):3513-3522. doi: 10.1093/bioinformatics/btac379.

Benchmarking atlas-level data integration in single-cell genomics.

Nat Methods. 2022 Jan;19(1):41-50. doi: 10.1038/s41592-021-01336-8. Epub 2021 Dec 23.

Integrated analysis of multimodal single-cell data.

Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.

Computational principles and challenges in single-cell data integration.

Nat Biotechnol. 2021 Oct;39(10):1202-1215. doi: 10.1038/s41587-021-00895-7. Epub 2021 May 3.

Uniform genomic data analysis in the NCI Genomic Data Commons.

Nat Commun. 2021 Feb 22;12(1):1226. doi: 10.1038/s41467-021-21254-9.

A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes.

Nucleic Acids Res. 2020 Dec 2;48(21):e125. doi: 10.1093/nar/gkaa881.

Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations.

Genome Biol. 2020 May 11;21(1):109. doi: 10.1186/s13059-020-02021-3.

Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression.

Genome Biol. 2019 Dec 23;20(1):296. doi: 10.1186/s13059-019-1874-1.

Pathway-level information extractor (PLIER) for gene expression data.

Nat Methods. 2019 Jul;16(7):607-610. doi: 10.1038/s41592-019-0456-1. Epub 2019 Jun 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

跨平台归一化可实现微阵列和 RNA-seq 数据上的机器学习模型训练。

Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously.

机构信息

Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Wynnewood, PA, USA.

出版信息

Commun Biol. 2023 Feb 25;6(1):222. doi: 10.1038/s42003-023-04588-6.

DOI:10.1038/s42003-023-04588-6

PMID:36841852

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9968332/

Abstract

摘要

跨平台归一化可实现微阵列和 RNA-seq 数据上的机器学习模型训练。

Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

跨平台归一化可实现微阵列和 RNA-seq 数据上的机器学习模型训练。

Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献