Suppr超能文献

特征特异性分位数归一化可使用基因表达数据对分子亚型进行跨平台分类。

Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data.

机构信息

Department of Molecular and Systems Biology.

Department of Environmental Health Sciences, Arnold School of Public Health, University of South Carolina, Columbia, SC, 29208, USA.

出版信息

Bioinformatics. 2018 Jun 1;34(11):1868-1874. doi: 10.1093/bioinformatics/bty026.

Abstract

MOTIVATION

Molecular subtypes of cancers and autoimmune disease, defined by transcriptomic profiling, have provided insight into disease pathogenesis, molecular heterogeneity and therapeutic responses. However, technical biases inherent to different gene expression profiling platforms present a unique problem when analyzing data generated from different studies. Currently, there is a lack of effective methods designed to eliminate platform-based bias. We present a method to normalize and classify RNA-seq data using machine learning classifiers trained on DNA microarray data and molecular subtypes in two datasets: breast invasive carcinoma (BRCA) and colorectal cancer (CRC).

RESULTS

Multiple analyses show that feature specific quantile normalization (FSQN) successfully removes platform-based bias from RNA-seq data, regardless of feature scaling or machine learning algorithm. We achieve up to 98% accuracy for BRCA data and 97% accuracy for CRC data in assigning molecular subtypes to RNA-seq data normalized using FSQN and a support vector machine trained exclusively on DNA microarray data. We find that maximum accuracy was achieved when normalizing RNA-seq datasets that contain at least 25 samples. FSQN allows comparison of RNA-seq data to existing DNA microarray datasets. Using these techniques, we can successfully leverage information from existing gene expression data in new analyses despite different platforms used for gene expression profiling.

AVAILABILITY AND IMPLEMENTATION

FSQN has been submitted as an R package to CRAN. All code used for this study is available on Github (https://github.com/jenniferfranks/FSQN).

CONTACT

michael.l.whitfield@dartmouth.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

通过转录组谱分析定义的癌症和自身免疫性疾病的分子亚型,为疾病发病机制、分子异质性和治疗反应提供了深入了解。然而,不同基因表达谱分析平台固有的技术偏差在分析来自不同研究的数据时带来了独特的问题。目前,缺乏专门设计的有效方法来消除基于平台的偏差。我们提出了一种使用基于机器学习的分类器对 RNA-seq 数据进行归一化和分类的方法,该分类器是在两个数据集(乳腺癌浸润性癌(BRCA)和结直肠癌(CRC))的 DNA 微阵列数据和分子亚型上进行训练的。

结果

多项分析表明,特征特定分位数归一化(FSQN)可以成功地从 RNA-seq 数据中去除基于平台的偏差,而与特征缩放或机器学习算法无关。我们在将使用 FSQN 和专门在 DNA 微阵列数据上训练的支持向量机归一化的 RNA-seq 数据分配给分子亚型方面实现了高达 98%的 BRCA 数据准确性和 97%的 CRC 数据准确性。我们发现,当归一化包含至少 25 个样本的 RNA-seq 数据集时,可以实现最大准确性。FSQN 允许将 RNA-seq 数据与现有 DNA 微阵列数据集进行比较。使用这些技术,我们可以成功地利用新分析中现有基因表达数据的信息,尽管用于基因表达谱分析的平台不同。

可用性和实现

FSQN 已作为 R 包提交给 CRAN。本研究中使用的所有代码都可在 Github 上获得(https://github.com/jenniferfranks/FSQN)。

联系方式

michael.l.whitfield@dartmouth.edu

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

10
aRrayLasso: a network-based approach to microarray interconversion.阵列套索:一种基于网络的微阵列相互转换方法。
Bioinformatics. 2015 Dec 1;31(23):3859-61. doi: 10.1093/bioinformatics/btv469. Epub 2015 Aug 17.

引用本文的文献

本文引用的文献

3
The consensus molecular subtypes of colorectal cancer.结直肠癌的共识分子亚型
Nat Med. 2015 Nov;21(11):1350-6. doi: 10.1038/nm.3967. Epub 2015 Oct 12.
9
Comprehensive molecular portraits of human breast tumours.人类乳腺肿瘤的全面分子特征图谱。
Nature. 2012 Oct 4;490(7418):61-70. doi: 10.1038/nature11412. Epub 2012 Sep 23.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验