Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong.
Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong.
Stat Med. 2023 Jul 10;42(15):2573-2589. doi: 10.1002/sim.9737. Epub 2023 May 10.
We consider the problem of estimating the nonparametric function in nonparametric logistic regression under semi-supervised framework, where a relatively small size labeled data set collected by case-control sampling and a relatively large size of unlabeled data containing only observations of predictors are available. This problem arises in various applications when the outcome variable is expensive or difficult to be observed directly. A two-stage nonparametric semi-supervised estimator based on spline method is proposed to estimate the target regression function by maximizing the likelihood function of the labeled case-control data. The unlabeled data are used in the first stage for estimating the density function that involves in the likelihood function. The consistency and functional asymptotic normality of the semi-supervised two-stage estimator are established under mild conditions. The proposed method, by making use of the unlabeled data, produces more efficient estimation of the target function than the traditional supervised counterpart. The performance of the proposed method is evaluated through extensive simulation studies. An application is illustrated with an analysis of a skin segmentation data.
我们考虑在半监督框架下估计非参数逻辑回归中的非参数函数的问题,其中有一个相对较小的标签数据集是通过病例对照抽样收集的,而一个相对较大的无标签数据集只包含预测因子的观测值。当因变量昂贵或难以直接观察时,这个问题会在各种应用中出现。我们提出了一种基于样条方法的两阶段非参数半监督估计器,通过最大化标签病例对照数据的似然函数来估计目标回归函数。无标签数据在第一阶段用于估计似然函数中涉及的密度函数。在温和条件下,建立了半监督两阶段估计器的一致性和函数渐近正态性。该方法通过利用无标签数据,对目标函数的估计比传统的监督方法更有效。通过广泛的模拟研究评估了所提出方法的性能。并通过对皮肤分割数据的分析说明了该方法的应用。