Salway Travis, Butt Zahid A, Wong Stanley, Abdia Younathan, Balshaw Robert, Rich Ashleigh J, Ablona Aidan, Wong Jason, Grennan Troy, Yu Amanda, Alvarez Maria, Rossi Carmine, Gilbert Mark, Krajden Mel, Janjua Naveed Z
Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada.
British Columbia Centre for Disease Control, Vancouver, BC, Canada.
Front Digit Health. 2020 Oct 6;2:547324. doi: 10.3389/fdgth.2020.547324. eCollection 2020.
Most public health datasets do not include sexual orientation measures, thereby limiting the availability of data to monitor health disparities, and evaluate tailored interventions. We therefore developed, validated, and applied a novel computable phenotype model to classify men who have sex with men (MSM) using multiple health datasets from British Columbia, Canada, 1990-2015. Three case surveillance databases, a public health laboratory database, and five administrative health databases were linked and deidentified (BC Hepatitis Testers Cohort), resulting in a retrospective cohort of 727,091 adult men. Known MSM status from the three disease case surveillance databases was used to develop a multivariable model for classifying MSM in the full cohort. Models were selected using "elastic-net" (GLMNet package) in R, and a final model optimized area under the receiver operating characteristics curve. We compared characteristics of known MSM, classified MSM, and classified heterosexual men. History of gonorrhea and syphilis diagnoses, HIV tests in the past year, history of visit to an identified gay and bisexual men's clinic, and residence in MSM-dense neighborhoods were all positively associated with being MSM. The selected model had sensitivity of 72%, specificity of 94%. Excluding those with known MSM status, a total of 85,521 men (12% of cohort) were classified as MSM. Computable phenotyping is a promising approach for classification of sexual minorities and investigation of health outcomes in the absence of routinely available self-report data.
大多数公共卫生数据集并不包含性取向测量指标,从而限制了用于监测健康差异和评估针对性干预措施的数据可用性。因此,我们开发、验证并应用了一种全新的可计算表型模型,利用加拿大不列颠哥伦比亚省1990 - 2015年的多个健康数据集对男男性行为者(MSM)进行分类。我们将三个病例监测数据库、一个公共卫生实验室数据库和五个行政卫生数据库进行了链接并去除了身份标识(不列颠哥伦比亚省肝炎检测队列),从而形成了一个由727,091名成年男性组成的回顾性队列。利用三个疾病病例监测数据库中已知的MSM状态,我们建立了一个多变量模型,用于对整个队列中的MSM进行分类。在R语言中使用“弹性网络”(GLMNet软件包)选择模型,并对最终模型的受试者操作特征曲线下面积进行了优化。我们比较了已知MSM、分类后的MSM以及分类后的异性恋男性的特征。淋病和梅毒诊断史、过去一年的HIV检测、前往已确定的男同性恋和双性恋男性诊所就诊史以及居住在MSM密集社区,均与MSM身份呈正相关。所选模型的灵敏度为72%,特异性为94%。排除已知MSM状态的人群后,共有85,521名男性(占队列的12%)被分类为MSM。在缺乏常规可得的自我报告数据的情况下,可计算表型分析是一种用于对性少数群体进行分类和调查健康结果的有前景的方法。