Wang Xiao-Feng, Ye Deping
Department of Quantitative Health Sciences / Biostatistics Section, Cleveland Clinic Lerner Research Institute, Cleveland, OH 44195, USA.
Department of Mathematics and Statistics, Memorial University of Newfoundland, St. John's, NL A1C 5S7, Canada.
J Multivar Anal. 2015 Jan 1;133:38-50. doi: 10.1016/j.jmva.2014.08.011.
This paper is motivated by a wide range of background correction problems in gene array data analysis, where the raw gene expression intensities are measured with error. Estimating a conditional density function from the contaminated expression data is a key aspect of statistical inference and visualization in these studies. We propose re-weighted deconvolution kernel methods to estimate the conditional density function in an additive error model, when the error distribution is known as well as when it is unknown. Theoretical properties of the proposed estimators are investigated with respect to the mean absolute error from a "double asymptotic" view. Practical rules are developed for the selection of smoothing-parameters. Simulated examples and an application to an Illumina bead microarray study are presented to illustrate the viability of the methods.
本文受基因阵列数据分析中广泛的背景校正问题所驱动,其中原始基因表达强度是带有误差进行测量的。从受污染的表达数据估计条件密度函数是这些研究中统计推断和可视化的关键方面。当误差分布已知和未知时,我们提出重新加权反卷积核方法来估计加性误差模型中的条件密度函数。从“双重渐近”的角度针对平均绝对误差研究了所提出估计量的理论性质。制定了选择平滑参数的实用规则。给出了模拟示例以及在Illumina珠芯片研究中的应用,以说明这些方法的可行性。