Kadri Sabah, Sboner Andrea, Sigaras Alexandros, Roy Somak
Department of Bioinformatics, Ann & Robert H Lurie Children's Hospital, Chicago, Illinois.
Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, New York; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, New York; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York.
J Mol Diagn. 2022 May;24(5):442-454. doi: 10.1016/j.jmoldx.2022.01.006. Epub 2022 Feb 18.
Systematic implementation of bioinformatics resources for next generation sequencing (NGS)-based clinical testing is an arduous undertaking. One of the key challenges involves developing an ecosystem of information technology infrastructure for enabling scalable and reproducible bioinformatics services that is resilient and secure for handling genetic and protected health information, often embedded in an existing non-bioinformatics-oriented infrastructure. Container technology provides an ideal and infrastructure-agnostic solution for molecular laboratories developing and using bioinformatics pipelines, whether on-premise or using the cloud. A container is a technology that provides a consistent computational environment and enables reproducibility, scalability, and security when developing NGS bioinformatics analysis pipelines. Containers can increase the bioinformatics team's productivity by automating and simplifying the maintenance of complex bioinformatics resources, as well as facilitate validation, version control, and documentation necessary for clinical laboratory regulatory compliance. Although there is increasing popularity in adopting containers for developing NGS bioinformatics pipelines, there is wide variability and inconsistency in the usage of containers that may result in suboptimal performance and potentially compromise the security and privacy of protected health information. In this article, the authors highlight the current state and provide best or recommended practices for building, using containers in NGS bioinformatics solutions in a clinical setting with focus on scalability, optimization, maintainability, and data security.
为基于下一代测序(NGS)的临床检测系统地实施生物信息学资源是一项艰巨的任务。其中一个关键挑战涉及开发一个信息技术基础设施生态系统,以实现可扩展和可重复的生物信息学服务,该服务对于处理通常嵌入现有非生物信息学导向基础设施中的遗传信息和受保护的健康信息具有弹性且安全。容器技术为分子实验室开发和使用生物信息学管道提供了一个理想的、与基础设施无关的解决方案,无论是在本地还是使用云。容器是一种技术,在开发NGS生物信息学分析管道时,它提供一致的计算环境,并实现可重复性、可扩展性和安全性。容器可以通过自动化和简化复杂生物信息学资源的维护来提高生物信息学团队的生产力,还能促进临床实验室监管合规所需的验证、版本控制和文档编制。尽管在采用容器开发NGS生物信息学管道方面越来越流行,但容器的使用存在很大的变异性和不一致性,这可能导致性能次优,并可能危及受保护健康信息的安全性和隐私。在本文中,作者强调了当前的状态,并提供了在临床环境中构建和在NGS生物信息学解决方案中使用容器的最佳或推荐实践,重点关注可扩展性、优化、可维护性和数据安全性。