Suppr超能文献

基因组研究中群体描述符的数据模型。

A data model for population descriptors in genomic research.

作者信息

Khan Alyna T, Adebamowo Clement, Fullerton Stephanie M, Hirbo Jibril, Konigsberg Iain R, Kraft Peter, Martin Iman, Nelson Sarah C, Ramsay Michèle, Wojcik Genevieve L, Adebamowo Sally N, Conomos Matthew P, Darst Burcu F, Hysong Micah R, Li Yun, Martin Alicia R, Mathias Rasika A, Rich Stephen S, Sakoda Lori C, Schrider Daniel R, Sharma Jayati, Smith Johanna L, Sun Quan, Zhang Yuji, Gogarten Stephanie M

机构信息

School of Engineering, Design, and Innovation, Pennsylvania State University, University Park, PA, USA.

Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD, USA.

出版信息

Am J Hum Genet. 2025 Jul 3;112(7):1504-1514. doi: 10.1016/j.ajhg.2025.05.011. Epub 2025 Jun 12.

Abstract

Population descriptors used in genetic studies have broad social and translational implications. There are no globally agreed-upon definitions or usages of common population descriptors (e.g., race, ethnicity, nationality, and tribe), many of which are applied ad hoc and/or derived from political or bureaucratic conventions. Recent recommendations have encouraged the retention of as much granularity in population descriptors as possible during data preparation, analysis, and interpretation of research results. However, genomic research infrastructures (i.e., current practices, resources, and workflows in genomic research) often lack systematic and flexible organization, structure, and harmonization of multifaceted and detailed population descriptor data. This can lead to loss of information, barriers to international collaboration, and potential issues in clinical translation. Here, we describe a data model, developed by the NIH-funded Polygenic Risk Methods in Diverse Populations (PRIMED) Consortium, that organizes and retains detailed population descriptor data for future research use. The model supports a versatile, traceable, and reproducible harmonization system that offers multiple benefits over existing data structures. This data model affords researchers the flexibility to thoughtfully choose and scientifically justify their choice of population descriptors. It avoids the conflation of social identities with biological categories and guards against harmful typological inferences. Genomic research tools of this kind will be crucial for producing scientifically robust findings that minimize potential harms of descriptor misuse while maximizing benefits for diverse communities.

摘要

基因研究中使用的人群描述符具有广泛的社会和转化意义。对于常见的人群描述符(如种族、族裔、国籍和部落),目前尚无全球统一认可的定义或用法,其中许多描述符是临时应用的,和/或源自政治或官僚惯例。最近的建议鼓励在数据准备、分析和研究结果解释过程中尽可能保留人群描述符的详细程度。然而,基因组研究基础设施(即基因组研究中的当前实践、资源和工作流程)往往缺乏对多方面详细人群描述符数据进行系统、灵活的组织、架构和协调。这可能导致信息丢失、国际合作受阻以及临床转化中出现潜在问题。在此,我们描述了一种由美国国立卫生研究院资助的不同人群多基因风险方法(PRIMED)联盟开发的数据模型,该模型组织并保留详细的人群描述符数据以供未来研究使用。该模型支持一个通用、可追溯且可重复的协调系统,与现有数据结构相比具有多种优势。这种数据模型使研究人员能够灵活地审慎选择人群描述符,并为其选择提供科学依据。它避免了社会身份与生物学类别之间的混淆,并防止有害的类型学推断。这类基因组研究工具对于得出科学可靠的结果至关重要,既能将描述符误用的潜在危害降至最低,又能为不同群体带来最大利益。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验