Hatefi Armin, Jafari Jozani Mohammad
1 Department of Statistical Sciences, University of Toronto and The Fields Institute for Research in Mathematical Sciences, Toronto, Canada.
2 Department of Statistics, University of Manitoba, Winnipeg, Canada.
Stat Methods Med Res. 2017 Dec;26(6):2552-2566. doi: 10.1177/0962280215601458. Epub 2015 Aug 26.
Rank-based sampling designs are widely used in situations where measuring the variable of interest is costly but a small number of sampling units (set) can be easily ranked prior to taking the final measurements on them and this can be done at little cost. When the variable of interest is binary, a common approach for ranking the sampling units is to estimate the probabilities of success through a logistic regression model. However, this requires training samples for model fitting. Also, in this approach once a sampling unit has been measured, the extra rank information obtained in the ranking process is not used further in the estimation process. To address these issues, in this paper, we propose to use the partially rank-ordered set sampling design with multiple concomitants. In this approach, instead of fitting a logistic regression model, a soft ranking technique is employed to obtain a vector of weights for each measured unit that represents the probability or the degree of belief associated with its rank among a small set of sampling units. We construct an estimator which combines the rank information and the observed partially rank-ordered set measurements themselves. The proposed methodology is applied to a breast cancer study to estimate the proportion of patients with malignant (cancerous) breast tumours in a given population. Through extensive numerical studies, the performance of the estimator is evaluated under various concomitants with different ranking potentials (i.e. good, intermediate and bad) and tie structures among the ranks. We show that the precision of the partially rank-ordered set estimator is better than its counterparts under simple random sampling and ranked set sampling designs and, hence, the sample size required to achieve a desired precision is reduced.
测量感兴趣的变量成本高昂,但少量抽样单元(集合)在对其进行最终测量之前能够轻易地进行排序,并且这样做成本很低。当感兴趣的变量是二元变量时,对抽样单元进行排序的一种常见方法是通过逻辑回归模型估计成功的概率。然而,这需要用于模型拟合的训练样本。此外,在这种方法中,一旦对一个抽样单元进行了测量,在排序过程中获得的额外秩信息在估计过程中就不再进一步使用。为了解决这些问题,在本文中,我们建议使用带有多个伴随变量的部分秩排序集抽样设计。在这种方法中,不是拟合逻辑回归模型,而是采用一种软排序技术来为每个测量单元获得一个权重向量,该向量表示与其在一小组抽样单元中的秩相关的概率或置信度。我们构建了一个结合秩信息和观测到的部分秩排序集测量值本身的估计量。所提出的方法应用于一项乳腺癌研究,以估计给定人群中患有恶性(癌性)乳腺肿瘤患者的比例。通过广泛的数值研究,在具有不同排序潜力(即好、中、差)的各种伴随变量以及秩之间的平局结构下评估了估计量的性能。我们表明,部分秩排序集估计量的精度优于简单随机抽样和秩排序集抽样设计下的对应估计量,因此,实现所需精度所需的样本量减少了。