Qureshi Muhammad Nouman, Ahelali Marwan H, Iftikhar Soofia, Hassan Amal, Alamri Osama Abdulaziz, Manzoor Summaira, Hanif Muhammad
School of Statistics, University of Minnesota, Minneapolis, USA.
Institute for Social Research, University of Michigan, Ann Arbor, MI, USA.
Heliyon. 2024 Jun 4;10(11):e32355. doi: 10.1016/j.heliyon.2024.e32355. eCollection 2024 Jun 15.
Estimating dispersion in populations that are extremely rare, hidden, geographically clustered, and hard to access is a well-known challenge. Conventional sampling approaches tend to overestimate the variance, even though it should be genuinely reduced. In this environment, adaptive cluster sampling is considered to be the most efficient sampling technique as it provides generally a lower variance than the other conventional probability sampling designs for the assessment of rare and geographically gathered population parameters like mean, total, variance, etc. The use of auxiliary data is very common to obtain the precise estimates of the estimators by taking advantage of the correlation between the survey variable and the auxiliary data. In this article, we introduced a generalized estimator for estimating the variance of populations that are rare, hidden, geographically clustered and hard-to-reached. The proposed estimator leverages both actual and transformed auxiliary data through adaptive cluster sampling. The expressions of approximate bias and mean square error of the proposed estimator are derived up to the first-order approximation using Taylor expansion. Some special cases are also obtained using the known parameters associated with the auxiliary variable. The proposed class of estimators is compared with available estimators using simulation and real data applications.
估计极其罕见、隐匿、地理上聚集且难以接触到的人群中的离散程度是一项众所周知的挑战。传统抽样方法往往会高估方差,尽管方差实际上应该降低。在这种情况下,自适应整群抽样被认为是最有效的抽样技术,因为在评估诸如均值、总量、方差等罕见且地理上聚集的总体参数时,它通常比其他传统概率抽样设计具有更低的方差。利用辅助数据通过利用调查变量与辅助数据之间的相关性来获得估计量的精确估计值是非常常见的。在本文中,我们引入了一种广义估计量,用于估计罕见、隐匿、地理上聚集且难以到达的人群的方差。所提出的估计量通过自适应整群抽样利用实际和变换后的辅助数据。使用泰勒展开式将所提出估计量的近似偏差和均方误差的表达式推导至一阶近似。还利用与辅助变量相关的已知参数得到了一些特殊情况。使用模拟和实际数据应用将所提出的估计量类别与现有估计量进行比较。