Suppr超能文献

从直接面向消费者的遗传和表型数据中验证和自动化学习心血管代谢多基因风险评分:对精准健康研究扩展的影响。

Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research.

机构信息

Galatea Bio, Inc., 975 W 22nd Street, Hialeah, Florida, 33010, USA.

Amphora Health, Batallon Independencia 80, Morelia, Michoacan, 58260, Mexico.

出版信息

Hum Genomics. 2022 Sep 8;16(1):37. doi: 10.1186/s40246-022-00406-y.

Abstract

INTRODUCTION

A major challenge to enabling precision health at a global scale is the bias between those who enroll in state sponsored genomic research and those suffering from chronic disease. More than 30 million people have been genotyped by direct-to-consumer (DTC) companies such as 23andMe, Ancestry DNA, and MyHeritage, providing a potential mechanism for democratizing access to medical interventions and thus catalyzing improvements in patient outcomes as the cost of data acquisition drops. However, much of these data are sequestered in the initial provider network, without the ability for the scientific community to either access or validate. Here, we present a novel geno-pheno platform that integrates heterogeneous data sources and applies learnings to common chronic disease conditions including Type 2 diabetes (T2D) and hypertension.

METHODS

We collected genotyped data from a novel DTC platform where participants upload their genotype data files and were invited to answer general health questionnaires regarding cardiometabolic traits over a period of 6 months. Quality control, imputation, and genome-wide association studies were performed on this dataset, and polygenic risk scores were built in a case-control setting using the BASIL algorithm.

RESULTS

We collected data on N = 4,550 (389 cases / 4,161 controls) who reported being affected or previously affected for T2D and N = 4,528 (1,027 cases / 3,501 controls) for hypertension. We identified 164 out of 272 variants showing identical effect direction to previously reported genome-significant findings in Europeans. Performance metric of the PRS models was AUC = 0.68, which is comparable to previously published PRS models obtained with larger datasets including clinical biomarkers.

DISCUSSION

DTC platforms have the potential of inverting research models of genome sequencing and phenotypic data acquisition. Quality control (QC) mechanisms proved to successfully enable traditional GWAS and PRS analyses. The direct participation of individuals has shown the potential to generate rich datasets enabling the creation of PRS cardiometabolic models. More importantly, federated learning of PRS from reuse of DTC data provides a mechanism for scaling precision health care delivery beyond the small number of countries who can afford to finance these efforts directly.

CONCLUSIONS

The genetics of T2D and hypertension have been studied extensively in controlled datasets, and various polygenic risk scores (PRS) have been developed. We developed predictive tools for both phenotypes trained with heterogeneous genotypic and phenotypic data generated outside of the clinical environment and show that our methods can recapitulate prior findings with fidelity. From these observations, we conclude that it is possible to leverage DTC genetic repositories to identify individuals at risk of debilitating diseases based on their unique genetic landscape so that informed, timely clinical interventions can be incorporated.

摘要

简介

在全球范围内实现精准健康的主要挑战是参加政府赞助的基因组研究的人与患有慢性疾病的人之间的偏见。像 23andMe、Ancestry DNA 和 MyHeritage 这样的直接面向消费者 (DTC) 公司已经对 3000 多万人进行了基因分型,这为医疗干预措施的民主化提供了潜在机制,从而随着数据获取成本的降低,促进了患者预后的改善。然而,这些数据中的大部分都被最初的供应商网络封锁了,科学界既无法访问也无法验证这些数据。在这里,我们提出了一种新颖的基因表型平台,该平台整合了异构数据源,并将学习应用于包括 2 型糖尿病 (T2D) 和高血压在内的常见慢性疾病。

方法

我们从一个新的 DTC 平台收集了基因分型数据,参与者上传他们的基因型数据文件,并被邀请在 6 个月的时间内回答有关心脏代谢特征的一般健康问卷。对该数据集进行了质量控制、插补和全基因组关联研究,并使用 BASIL 算法在病例对照设置中构建了多基因风险评分。

结果

我们收集了 N=4550 名(389 例/4161 例对照)报告患有 T2D 或以前患有 T2D 的患者的数据,以及 N=4528 名(1027 例/3501 例对照)患有高血压的患者的数据。我们在 272 个变体中发现了 164 个变体,其效应方向与欧洲人先前报道的全基因组显著发现相同。PRS 模型的性能指标 AUC=0.68,与使用包括临床生物标志物在内的更大数据集获得的先前发表的 PRS 模型相当。

讨论

DTC 平台有可能颠覆基因组测序和表型数据采集的研究模式。质量控制 (QC) 机制成功地实现了传统全基因组关联研究和 PRS 分析。个人的直接参与表明有可能生成丰富的数据集,从而创建 PRS 心脏代谢模型。更重要的是,从 DTC 数据的再利用中进行联邦学习为超越直接能够负担这些努力的少数几个国家提供了一种扩展精准医疗保健服务的机制。

结论

T2D 和高血压的遗传学已在对照数据集上进行了广泛研究,并开发了各种多基因风险评分 (PRS)。我们使用在临床环境之外生成的异质基因型和表型数据为两种表型开发了预测工具,并表明我们的方法可以准确地重现先前的发现。从这些观察结果中,我们得出结论,利用 DTC 遗传知识库可以根据个体独特的遗传特征识别患有致残性疾病的个体,以便及时进行知情的临床干预。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2eae/9454216/1333ca07b24d/40246_2022_406_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验