Laga Ian, Bao Le, Niu Xiaoyue
Department of Mathematical Sciences, Montana State University, Bozeman, MT.
Department of Statistics, Pennsylvania State University, UniversityPark, PA.
J Am Stat Assoc. 2023;118(543):1515-1524. doi: 10.1080/01621459.2023.2165929. Epub 2023 Feb 28.
Aggregated relational data (ARD), formed from "How many X's do you know?" questions, is a powerful tool for learning important network characteristics with incomplete network data. Compared to traditional survey methods, ARD is attractive as it does not require a sample from the target population and does not ask respondents to self-reveal their own status. This is helpful for studying hard-to-reach populations like female sex workers who may be hesitant to reveal their status. From December 2008 to February 2009, the Kiev International Institute of Sociology (KIIS) collected ARD from 10,866 respondents to estimate the size of HIV-related groups in Ukraine. To analyze this data, we propose a new ARD model which incorporates respondent and group covariates in a regression framework and includes a bias term that is correlated between groups. We also introduce a new scaling procedure utilizing the correlation structure to further reduce biases. The resulting size estimates of those most-at-risk of HIV infection can improve the HIV response efficiency in Ukraine. Additionally, the proposed model allows us to better understand two network features without the full network data: 1. What characteristics affect who respondents know, and 2. How is knowing someone from one group related to knowing people from other groups. These features can allow researchers to better recruit marginalized individuals into the prevention and treatment programs. Our proposed model and several existing NSUM models are implemented in the networkscaleup R package.
聚合关系数据(ARD)由“你认识多少个X?”这类问题构成,是一种利用不完整网络数据了解重要网络特征的强大工具。与传统调查方法相比,ARD很有吸引力,因为它不需要从目标人群中抽样,也不要求受访者自我披露自身状况。这有助于研究像女性性工作者这类难以接触到的人群,她们可能不愿透露自己的状况。2008年12月至2009年2月,基辅国际社会学研究所(KIIS)从10866名受访者那里收集了ARD,以估计乌克兰与艾滋病毒相关群体的规模。为了分析这些数据,我们提出了一种新的ARD模型,该模型在回归框架中纳入了受访者和群体协变量,并包含一个在群体之间相关的偏差项。我们还引入了一种利用相关结构的新缩放程序,以进一步减少偏差。由此得出的对艾滋病毒感染风险最高人群的规模估计可以提高乌克兰的艾滋病毒应对效率。此外,所提出的模型使我们能够在没有完整网络数据的情况下更好地理解两个网络特征:1. 哪些特征影响受访者认识谁,以及2. 认识一个群体中的某个人与认识其他群体中的人有何关系。这些特征可以让研究人员更好地将边缘化个体纳入预防和治疗项目。我们提出的模型和几个现有的NSUM模型在networkscaleup R包中实现。