• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

优化英国生物银行基于云的研究分析平台,以在全基因组测序数据中精细定位冠状动脉疾病基因座。

Optimizing UK biobank cloud-based research analysis platform to fine-map coronary artery disease loci in whole genome sequencing data.

作者信息

Sng Letitia M F, Kaphle Anubhav, O'Brien Mitchell J, Hosking Brendan, Reguant Roc, Verjans Johan, Jain Yatish, Twine Natalie A, Bauer Denis C

机构信息

Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, New South Wales, Australia.

Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Melbourne, Victoria, Australia.

出版信息

Sci Rep. 2025 Mar 25;15(1):10335. doi: 10.1038/s41598-025-95286-2.

DOI:10.1038/s41598-025-95286-2
PMID:40133599
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11937306/
Abstract

We conducted the first comprehensive association analysis of a coronary artery disease (CAD) cohort within the recently released UK Biobank (UKB) whole genome sequencing dataset. We employed fine mapping tool PolyFun and pinpoint rs10757274 as the most likely causal SNV within the 9p21.3 CAD risk locus. Notably, we show that machine-learning (ML) approaches, REGENIE and VariantSpark, exhibited greater sensitivity compared to traditional single-SNV logistic regression, uncovering rs28451064 a known risk locus in 21q22.11. Our findings underscore the utility of leveraging advanced computational techniques and cloud-based resources for mega-biobank analyses. Aligning with the paradigm shift of bringing compute to data, we demonstrate a 44% cost reduction and 94% speedup through compute architecture optimisation on UK Biobank's Research Analysis Platform using our RAPpoet approach. We discuss three considerations for researchers implementing novel workflows for datasets hosted on cloud-platforms, to pave the way for harnessing mega-biobank-sized data through scalable, cost-effective cloud computing solutions.

摘要

我们在最近发布的英国生物银行(UKB)全基因组测序数据集中,对冠心病(CAD)队列进行了首次全面的关联分析。我们使用精细定位工具PolyFun,并确定rs10757274是9p21.3 CAD风险基因座中最可能的因果单核苷酸变异(SNV)。值得注意的是,我们发现机器学习(ML)方法REGENIE和VariantSpark与传统的单SNV逻辑回归相比,表现出更高的灵敏度,发现了21q22.11中一个已知的风险基因座rs28451064。我们的研究结果强调了利用先进计算技术和基于云的资源进行大型生物银行分析的实用性。与将计算带到数据的范式转变相一致,我们通过使用我们的RAPpoet方法在英国生物银行的研究分析平台上进行计算架构优化,展示了44%的成本降低和94%的加速。我们讨论了研究人员为云平台上托管的数据集实施新工作流程时的三个注意事项,为通过可扩展、经济高效的云计算解决方案利用大型生物银行规模的数据铺平道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1816/11937306/168013b0c550/41598_2025_95286_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1816/11937306/53c4bbac9006/41598_2025_95286_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1816/11937306/ffe396709b6e/41598_2025_95286_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1816/11937306/3c5b158fc417/41598_2025_95286_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1816/11937306/883c5bc4688b/41598_2025_95286_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1816/11937306/168013b0c550/41598_2025_95286_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1816/11937306/53c4bbac9006/41598_2025_95286_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1816/11937306/ffe396709b6e/41598_2025_95286_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1816/11937306/3c5b158fc417/41598_2025_95286_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1816/11937306/883c5bc4688b/41598_2025_95286_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1816/11937306/168013b0c550/41598_2025_95286_Fig5_HTML.jpg

相似文献

1
Optimizing UK biobank cloud-based research analysis platform to fine-map coronary artery disease loci in whole genome sequencing data.优化英国生物银行基于云的研究分析平台,以在全基因组测序数据中精细定位冠状动脉疾病基因座。
Sci Rep. 2025 Mar 25;15(1):10335. doi: 10.1038/s41598-025-95286-2.
2
Disease prediction with multi-omics and biomarkers empowers case-control genetic discoveries in the UK Biobank.多组学和生物标志物的疾病预测使英国生物库中的病例对照遗传发现成为可能。
Nat Genet. 2024 Sep;56(9):1821-1831. doi: 10.1038/s41588-024-01898-1. Epub 2024 Sep 11.
3
In silico genome-wide gene-based association analysis reveals new genes predisposing to coronary artery disease.基于全基因组基因的计算机模拟关联分析揭示了导致冠心病的新易感基因。
Clin Genet. 2022 Jan;101(1):78-86. doi: 10.1111/cge.14073. Epub 2021 Nov 1.
4
Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease.64 个新的遗传位点的鉴定为冠心病的遗传结构提供了更广泛的视角。
Circ Res. 2018 Feb 2;122(3):433-443. doi: 10.1161/CIRCRESAHA.117.312086. Epub 2017 Dec 6.
5
Genome-wide association study of coronary artery disease among individuals with diabetes: the UK Biobank.在有糖尿病的个体中进行的冠心病全基因组关联研究:英国生物银行。
Diabetologia. 2018 Oct;61(10):2174-2179. doi: 10.1007/s00125-018-4686-z. Epub 2018 Jul 12.
6
How group structure impacts the numbers at risk for coronary artery disease: polygenic risk scores and nongenetic risk factors in the UK Biobank cohort.群体结构如何影响冠心病的风险人群数量:英国生物库队列中的多基因风险评分和非遗传风险因素。
Genetics. 2024 Jul 8;227(3). doi: 10.1093/genetics/iyae086.
7
Do positive psychosocial factors contribute to the prediction of coronary artery disease? A UK Biobank-based machine learning approach.积极的社会心理因素是否有助于预测冠状动脉疾病?一种基于英国生物银行的机器学习方法。
Eur J Prev Cardiol. 2025 Apr 22;32(6):443-452. doi: 10.1093/eurjpc/zwae237.
8
Evaluation of a machine learning-based metabolic marker for coronary artery disease in the UK Biobank.在英国生物银行中基于机器学习的冠状动脉疾病代谢标志物评估。
Atherosclerosis. 2025 Feb;401:119103. doi: 10.1016/j.atherosclerosis.2024.119103. Epub 2024 Dec 18.
9
A genome-wide association study identifies genetic variants associated with hip pain in the UK Biobank cohort (N = 221,127).一项全基因组关联研究在英国生物银行队列(N = 221,127)中确定了与髋关节疼痛相关的基因变异。
Sci Rep. 2025 Jan 22;15(1):2812. doi: 10.1038/s41598-025-85871-w.
10
Genome-wide association study of cardiometabolic multimorbidity in the UK Biobank.全基因组关联研究 UK Biobank 中心血管代谢性多种疾病。
Clin Genet. 2024 Jul;106(1):72-81. doi: 10.1111/cge.14513. Epub 2024 Feb 26.

本文引用的文献

1
Future-proofing genomic data and consent management: a comprehensive review of technology innovations.未来基因组数据和知情同意管理:技术创新的综合评述。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae021.
2
Genomic data in the All of Us Research Program.全美国研究计划中的基因组数据。
Nature. 2024 Mar;627(8003):340-346. doi: 10.1038/s41586-023-06957-x. Epub 2024 Feb 19.
3
Scalable genomic data exchange and analytics with sBeacon.使用sBeacon进行可扩展的基因组数据交换与分析。
Nat Biotechnol. 2023 Nov;41(11):1510-1512. doi: 10.1038/s41587-023-01972-9.
4
Demonstrating paths for unlocking the value of cloud genomics through cross cohort analysis.展示通过跨队列分析解锁云基因组学价值的途径。
Nat Commun. 2023 Sep 5;14(1):5419. doi: 10.1038/s41467-023-41185-x.
5
A simple new approach to variable selection in regression, with application to genetic fine mapping.一种用于回归中变量选择的简单新方法及其在基因精细定位中的应用。
J R Stat Soc Series B Stat Methodol. 2020 Dec;82(5):1273-1300. doi: 10.1111/rssb.12388. Epub 2020 Jul 10.
6
Effect of 9p21.3 (lncRNA and CDKN2A/2B) variant on lipid profile.9p21.3(长链非编码RNA和CDKN2A/2B)变异对血脂谱的影响。
Front Cardiovasc Med. 2022 Sep 7;9:946289. doi: 10.3389/fcvm.2022.946289. eCollection 2022.
7
Computationally efficient whole-genome regression for quantitative and binary traits.计算效率高的全基因组回归分析用于定量和二项性状。
Nat Genet. 2021 Jul;53(7):1097-1103. doi: 10.1038/s41588-021-00870-7. Epub 2021 May 20.
8
Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.
9
Functionally informed fine-mapping and polygenic localization of complex trait heritability.功能信息指导的复杂性状遗传力精细映射和多基因定位。
Nat Genet. 2020 Dec;52(12):1355-1363. doi: 10.1038/s41588-020-00735-5. Epub 2020 Nov 16.
10
VariantSpark: Cloud-based machine learning for association study of complex phenotype and large-scale genomic data.VariantSpark:基于云的机器学习,用于复杂表型和大规模基因组数据的关联研究。
Gigascience. 2020 Aug 1;9(8). doi: 10.1093/gigascience/giaa077.