• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

有序分位数归一化:一种为交叉验证时代构建的半参数变换。

Ordered quantile normalization: a semiparametric transformation built for the cross-validation era.

作者信息

Peterson Ryan A, Cavanaugh Joseph E

机构信息

Department of Biostatistics, University of Iowa College of Public Health, Iowa City, IA, USA.

Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

出版信息

J Appl Stat. 2019 Jun 15;47(13-15):2312-2327. doi: 10.1080/02664763.2019.1630372. eCollection 2020.

DOI:10.1080/02664763.2019.1630372
PMID:35707424
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9042069/
Abstract

Normalization transformations have recently experienced a resurgence in popularity in the era of machine learning, particularly in data preprocessing. However, the classical methods that can be adapted to cross-validation are not always effective. We introduce Ordered Quantile (ORQ) normalization, a one-to-one transformation that is designed to consistently and effectively transform a vector of arbitrary distribution into a vector that follows a normal (Gaussian) distribution. In the absence of ties, ORQ normalization is guaranteed to produce normally distributed transformed data. Once trained, an ORQ transformation can be readily and effectively applied to new data. We compare the effectiveness of the ORQ technique with other popular normalization methods in a simulation study where the true data generating distributions are known. We find that ORQ normalization is the only method that works consistently and effectively, regardless of the underlying distribution. We also explore the use of repeated cross-validation to identify the best normalizing transformation when the true underlying distribution is unknown. We apply our technique and other normalization methods via the bestNormalize R package on a car pricing data set. We built bestNormalize to evaluate the normalization efficacy of many candidate transformations; the package is freely available via the Comprehensive R Archive Network.

摘要

归一化变换在机器学习时代近来再度流行起来,尤其是在数据预处理方面。然而,可适用于交叉验证的经典方法并不总是有效。我们引入有序分位数(ORQ)归一化,这是一种一对一变换,旨在将任意分布的向量一致且有效地变换为遵循正态(高斯)分布的向量。在没有平局的情况下,ORQ归一化保证能产生正态分布的变换后数据。一旦训练完成,ORQ变换就能轻松且有效地应用于新数据。在真实数据生成分布已知的模拟研究中,我们将ORQ技术的有效性与其他流行的归一化方法进行了比较。我们发现,无论基础分布如何,ORQ归一化是唯一始终有效工作的方法。当真实的基础分布未知时,我们还探索了使用重复交叉验证来确定最佳归一化变换。我们通过bestNormalize R包将我们的技术和其他归一化方法应用于一个汽车定价数据集。我们构建了bestNormalize来评估许多候选变换的归一化效果;该软件包可通过综合R存档网络免费获取。

相似文献

1
Ordered quantile normalization: a semiparametric transformation built for the cross-validation era.有序分位数归一化:一种为交叉验证时代构建的半参数变换。
J Appl Stat. 2019 Jun 15;47(13-15):2312-2327. doi: 10.1080/02664763.2019.1630372. eCollection 2020.
2
Cross-platform normalization of microarray and RNA-seq data for machine learning applications.用于机器学习应用的微阵列和RNA测序数据的跨平台归一化。
PeerJ. 2016 Jan 21;4:e1621. doi: 10.7717/peerj.1621. eCollection 2016.
3
Characterizations of ordered semigroups in terms of (∈, ∈ ∨q)-fuzzy interior ideals.基于(属于,属于或拟等于)-模糊内理想的序半群的刻画
Neural Comput Appl. 2012 Apr;21(3):433-440. doi: 10.1007/s00521-010-0463-8. Epub 2010 Dec 4.
4
Analysis of intensity normalization for optimal segmentation performance of a fully convolutional neural network.分析强度归一化对全卷积神经网络最佳分割性能的影响。
Z Med Phys. 2019 May;29(2):128-138. doi: 10.1016/j.zemedi.2018.11.004. Epub 2018 Dec 20.
5
Preprocessing and Quality Control Strategies for Illumina DASL Assay-Based Brain Gene Expression Studies with Semi-Degraded Samples.基于Illumina DASL分析的半降解样本脑基因表达研究的预处理和质量控制策略
Front Genet. 2012 Feb 24;3:11. doi: 10.3389/fgene.2012.00011. eCollection 2012.
6
Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data.特征特异性分位数归一化可使用基因表达数据对分子亚型进行跨平台分类。
Bioinformatics. 2018 Jun 1;34(11):1868-1874. doi: 10.1093/bioinformatics/bty026.
7
Preprocessing Steps for Agilent MicroRNA Arrays: Does the Order Matter?安捷伦微小RNA芯片的预处理步骤:顺序重要吗?
Cancer Inform. 2015 Sep 3;13(Suppl 4):105-9. doi: 10.4137/CIN.S21630. eCollection 2014.
8
How cyanobacteria pose new problems to old methods: challenges in microarray time series analysis.蓝藻如何给旧方法带来新问题:微阵列时间序列分析面临的挑战。
BMC Bioinformatics. 2013 Apr 21;14:133. doi: 10.1186/1471-2105-14-133.
9
CytoNorm: A Normalization Algorithm for Cytometry Data.CytoNorm:一种流式细胞术数据标准化算法。
Cytometry A. 2020 Mar;97(3):268-278. doi: 10.1002/cyto.a.23904. Epub 2019 Oct 21.
10
DBNorm: normalizing high-density oligonucleotide microarray data based on distributions.DBNorm:基于分布对高密度寡核苷酸微阵列数据进行归一化处理。
BMC Bioinformatics. 2017 Nov 29;18(1):527. doi: 10.1186/s12859-017-1912-5.

引用本文的文献

1
Digital Informed Consent/Assent in Clinical Trials Among Pregnant Women, Minors, and Adults: Multicountry Cross-Sectional Evaluation of Comprehension and Satisfaction.孕妇、未成年人和成年人临床试验中的数字知情同意/赞成:理解与满意度的多国横断面评估
JMIR Hum Factors. 2025 Aug 15;12:e65569. doi: 10.2196/65569.
2
Trajectories of psychotic-like experiences in youth and associations with lifestyle factors.青少年类精神病体验的轨迹及其与生活方式因素的关联。
J Child Psychol Psychiatry. 2025 Jul 29. doi: 10.1111/jcpp.14179.
3
Tree resistance outweighs climatic drivers in governing extreme growth suppression.在控制极端生长抑制方面,树木抗性比气候驱动因素更为重要。
iScience. 2025 Jun 30;28(8):113043. doi: 10.1016/j.isci.2025.113043. eCollection 2025 Aug 15.
4
Application of machine learning for predicting the incubation period of water droplet erosion in metals.机器学习在预测金属水滴侵蚀潜伏期方面的应用。
Discov Appl Sci. 2025;7(7):712. doi: 10.1007/s42452-025-07268-8. Epub 2025 Jul 1.
5
Estimating Rooting Depth From Herbarium Specimens Might Be More Accurate Than Using Large Trait Databases.根据植物标本估计根系深度可能比使用大型性状数据库更准确。
Ecol Evol. 2025 Jun 17;15(6):e71529. doi: 10.1002/ece3.71529. eCollection 2025 Jun.
6
Resolving spatial subclonal genomic heterogeneity of loss of heterozygosity and extrachromosomal DNA in gliomas.解析胶质瘤中杂合性缺失和染色体外DNA的空间亚克隆基因组异质性。
Nat Commun. 2025 Jun 13;16(1):5290. doi: 10.1038/s41467-025-59805-z.
7
The language of paranoia: linguistic analysis of SMI speech with considerations of race and sex.偏执狂的语言:对严重精神疾病患者言语的语言分析,并考虑种族和性别因素
J Ment Health. 2025 Jun 12:1-8. doi: 10.1080/09638237.2025.2512313.
8
Evaluating the effectiveness of a population-level health intervention to increment HCV treatment coverage in tuscany region, Italy: An interrupted time series analysis.评估一项旨在提高意大利托斯卡纳地区丙型肝炎病毒(HCV)治疗覆盖率的人群层面健康干预措施的有效性:一项中断时间序列分析。
PLoS One. 2025 May 16;20(5):e0306733. doi: 10.1371/journal.pone.0306733. eCollection 2025.
9
Constituent of extracellular polymeric substances (EPS) produced by a range of soil bacteria and fungi.一系列土壤细菌和真菌产生的细胞外聚合物(EPS)的组成成分。
BMC Microbiol. 2025 May 15;25(1):298. doi: 10.1186/s12866-025-04034-z.
10
Social context prevents heat hormetic effects against mutagens during fish development.社会环境会阻碍鱼类发育过程中热应激对诱变剂的影响。
FEBS Lett. 2025 Apr 23;599(15):2107-28. doi: 10.1002/1873-3468.70047.

本文引用的文献

1
The use of transformations.变换的使用。
Biometrics. 1947 Mar;3(1):39-52.
2
Rank-based inverse normal transformations are increasingly used, but are they merited?基于秩的逆正态变换的使用越来越多,但它们值得这样做吗?
Behav Genet. 2009 Sep;39(5):580-95. doi: 10.1007/s10519-009-9281-0. Epub 2009 Jun 14.