Suppr超能文献

用于机器学习应用的微阵列和RNA测序数据的跨平台归一化。

Cross-platform normalization of microarray and RNA-seq data for machine learning applications.

作者信息

Thompson Jeffrey A, Tan Jie, Greene Casey S

机构信息

Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America; Quantitative Biomedical Sciences Program, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America.

Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America; Molecular and Cellular Biology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America.

出版信息

PeerJ. 2016 Jan 21;4:e1621. doi: 10.7717/peerj.1621. eCollection 2016.

Abstract

Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log 2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language.

摘要

大型的公开可用基因表达数据集通常借助机器学习算法进行分析。尽管RNA测序越来越成为首选技术,但大量的表达数据已经以微阵列数据的形式存在。如果从传统数据构建的机器学习模型能够应用于RNA测序数据,那么就可以创建更大、更多样化的训练数据集,并对新生成的数据进行验证。我们开发了训练分布匹配(TDM)方法,该方法可对RNA测序数据进行转换,以便与从传统平台构建的模型一起使用。我们在基因表达的模拟数据集和生物数据集上评估了TDM,以及分位数归一化、非正态变换和简单的log2变换。我们的评估包括监督式和非监督式机器学习方法。我们发现TDM在各种设置下都表现出始终如一的强大性能,并且分位数归一化在许多情况下也表现良好。我们还为R编程语言提供了一个TDM软件包。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feb8/4736986/41ec4615361c/peerj-04-1621-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验