Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, CA, USA.
Palo Alto Epidemiology Research and Information Center for Genomics, VA Palo Alto, CA, USA.
Sci Rep. 2021 Dec 1;11(1):23229. doi: 10.1038/s41598-021-02569-5.
Biomedical studies have become larger in size and yielded large quantities of data, yet efficient data processing remains a challenge. Here we present Trellis, a cloud-based data and task management framework that completely automates the process from data ingestion to result presentation, while tracking data lineage, facilitating information query, and supporting fault-tolerance and scalability. Using a graph database to coordinate the state of the data processing workflows and a scalable microservice architecture to perform bioinformatics tasks, Trellis has enabled efficient variant calling on 100,000 human genomes collected in the VA Million Veteran Program.
生物医学研究的规模不断扩大,产生了大量的数据,但有效的数据处理仍然是一个挑战。在这里,我们介绍 Trellis,这是一个基于云的数据和任务管理框架,它完全自动化了从数据摄取到结果呈现的过程,同时跟踪数据沿袭,方便信息查询,并支持容错和可扩展性。使用图形数据库来协调数据处理工作流的状态,以及使用可扩展的微服务架构来执行生物信息学任务,Trellis 已经能够在退伍军人百万基因组计划中收集的 10 万个人类基因组上实现高效的变异调用。