Suppr超能文献

一个基于云的基因组学基础设施,带有适用于群体规模测序项目的变异检测流程。

: A cloud-based genomics infrastructure with variant-calling pipeline suited for population-scale sequencing projects.

作者信息

Siddiqui Noora, Lee Breanna, Yi Victoria, Farek Jesse, Khan Ziad, Kalla Sara E, Wang Qiaoyan, Walker Kimberly, Meldrim James, Kachulis Christopher, Gatzen Michael, Lennon Niall J, Mehtalia Shyamal, Catreux Severine, Mehio Rami, Gibbs Richard A, Venner Eric

机构信息

Prostate Cancer Clinical Trials Consortium, New York, NY. USA.

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX. USA.

出版信息

medRxiv. 2025 Apr 30:2025.04.29.25326690. doi: 10.1101/2025.04.29.25326690.

Abstract

BACKGROUND

The Research Program () is one of the world's largest sequencing efforts that will generate genetic data for over one million individuals from diverse backgrounds. This historic megaproject will create novel research platforms that integrate an unprecedented amount of genetic data with longitudinal health information. Here, we describe the design of , a resilient, open-source cloud architecture for implementing genomics workflows that has successfully analyzed petabytes of participant genomic information for - thereby enabling other large-scale sequencing efforts with a comprehensive set of tools to power analysis. The infrastructure is tremendously scalable and has routinely processed fluctuating workloads of up to 9,000 whole-genome sequencing (WGS) samples for , monthly. It also lends itself to multiple projects. Serverless technology and container orchestration form the basis of 's system for managing this volume of data.

RESULTS

In 12 months of production (within a single Amazon Web Services (AWS) Region), around 200 million serverless functions and over 20 million messages coordinated the analysis of 1.8 million bioinformatics, quality control, and clinical reporting jobs. Adapting WGS analysis to clinical projects requires adaptation of variant-calling methods to enrich the reliable detection of variants with known clinical importance. Thus, we also share the process by which we tuned the variant-calling pipeline in use by the multiple genome centers supporting to maximize precision and accuracy for low fraction variant calls with clinical significance.

CONCLUSIONS

When combined with hardware-accelerated implementations for genomic analysis, Celeste had far-reaching, positive implications for turn-around time, dynamic scalability, security, and storage of analysis for one hundred-thousand whole-genome samples and counting. Other groups may align their sequencing workflows to this harmonized pipeline standard, included within the framework, to meet clinical requisites for population-scale sequencing efforts. is available as an Amazon Web Services (AWS) deployment in GitHub, and includes command-line parameters and software containers.

摘要

背景

研究项目()是全球最大的测序工作之一,将为来自不同背景的100多万人生成基因数据。这个具有历史意义的大型项目将创建新的研究平台,将前所未有的大量基因数据与纵向健康信息整合在一起。在此,我们描述了Celeste的设计,这是一种用于实施基因组学工作流程的弹性开源云架构,它已成功分析了Petabyte级别的参与者基因组信息用于[项目名称],从而通过一套全面的工具为其他大规模测序工作提供支持以推动分析。Celeste基础设施具有极大的可扩展性,每月常规处理多达9000个全基因组测序(WGS)样本的波动工作量。它也适用于多个项目。无服务器技术和容器编排构成了Celeste管理此数据量的系统基础。

结果

在生产的12个月内(在单个亚马逊网络服务(AWS)区域内),约2亿个无服务器函数和超过2000万条消息协调了对180万个生物信息学、质量控制和临床报告任务的分析。将WGS分析应用于临床项目需要调整变异检测方法,以加强对具有已知临床重要性的变异的可靠检测。因此,我们还分享了我们调整支持[项目名称]的多个基因组中心所使用的变异检测流程的过程,以最大限度地提高对具有临床意义的低比例变异调用的精度和准确性。

结论

当与用于基因组分析的硬件加速实现相结合时,Celeste对十万个全基因组样本及以上的分析周转时间、动态可扩展性、安全性和存储产生了深远的积极影响。其他团队可以将其测序工作流程与包含在Celeste框架内的这个统一流程标准保持一致,以满足群体规模测序工作的临床要求。Celeste可作为亚马逊网络服务(AWS)部署在GitHub上,包括命令行参数和软件容器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/360a/12060955/6a00a577442c/nihpp-2025.04.29.25326690v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验