Watkins Sarah Holmes, Testa Christian, Chen Jarvis T, De Vivo Immaculata, Simpkin Andrew J, Tilling Kate, Diez Roux Ana V, Davey Smith George, Waterman Pamela D, Suderman Matthew, Relton Caroline, Krieger Nancy
Population Health Sciences, Bristol Medical School, University of Bristol, Bristol BS8 2BN, UK.
Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol BS8 2BN, UK.
Environ Epigenet. 2023 Jul 15;9(1):dvad005. doi: 10.1093/eep/dvad005. eCollection 2023.
Epigenetic clocks are increasingly being used as a tool to assess the impact of a wide variety of phenotypes and exposures on healthy ageing, with a recent focus on social determinants of health. However, little attention has been paid to the sociodemographic characteristics of participants on whom these clocks have been based. Participant characteristics are important because sociodemographic and socioeconomic factors are known to be associated with both DNA methylation variation and healthy ageing. It is also well known that machine learning algorithms have the potential to exacerbate health inequities through the use of unrepresentative samples - prediction models may underperform in social groups that were poorly represented in the training data used to construct the model. To address this gap in the literature, we conducted a review of the sociodemographic characteristics of the participants whose data were used to construct 13 commonly used epigenetic clocks. We found that although some of the epigenetic clocks were created utilizing data provided by individuals from different ages, sexes/genders, and racialized groups, sociodemographic characteristics are generally poorly reported. Reported information is limited by inadequate conceptualization of the social dimensions and exposure implications of gender and racialized inequality, and socioeconomic data are infrequently reported. It is important for future work to ensure clear reporting of tangible data on the sociodemographic and socioeconomic characteristics of all the participants in the study to ensure that other researchers can make informed judgements about the appropriateness of the model for their study population.
表观遗传时钟越来越多地被用作一种工具,以评估各种表型和暴露因素对健康衰老的影响,最近的重点是健康的社会决定因素。然而,对于这些时钟所基于的参与者的社会人口学特征却很少有人关注。参与者特征很重要,因为已知社会人口学和社会经济因素与DNA甲基化变异和健康衰老都有关联。众所周知,机器学习算法有可能通过使用缺乏代表性的样本来加剧健康不平等——预测模型在用于构建模型的训练数据中代表性不足的社会群体中可能表现不佳。为了填补文献中的这一空白,我们对用于构建13个常用表观遗传时钟的数据的参与者的社会人口学特征进行了综述。我们发现,尽管一些表观遗传时钟是利用来自不同年龄、性别/性别的个体以及种族化群体提供的数据创建的,但社会人口学特征的报告总体上很差。报告的信息受到性别和种族化不平等的社会维度和暴露影响的概念化不足的限制,社会经济数据也很少被报告。对于未来的工作来说,重要的是要确保明确报告研究中所有参与者的社会人口学和社会经济特征的具体数据,以确保其他研究人员能够就该模型对其研究人群的适用性做出明智的判断。