Elisseev Vadim, Gardiner Laura-Jayne, Krishna Ritesh
IBM Research Europe, Hartree Centre, Daresbury Laboratory, Keckwick Lane, WarringtonWA4 4AD, Cheshire, UK.
Wrexham Glyndwr University, Mold Rd, Wrexham LL11 2AW, Wales, UK.
Comput Struct Biotechnol J. 2022 Apr 20;20:1914-1924. doi: 10.1016/j.csbj.2022.04.014. eCollection 2022.
We present a proof of concept implementation of the in-memory computing paradigm that we use to facilitate the analysis of metagenomic sequencing reads. In doing so we compare the performance of POSIX™file systems and key-value storage for omics data, and we show the potential for integrating high-performance computing (HPC) and cloud native technologies. We show that in-memory key-value storage offers possibilities for improved handling of omics data through more flexible and faster data processing. We envision fully containerized workflows and their deployment in portable micro-pipelines with multiple instances working concurrently with the same distributed in-memory storage. To highlight the potential usage of this technology for event driven and real-time data processing, we use a biological case study focused on the growing threat of antimicrobial resistance (AMR). We develop a workflow encompassing bioinformatics and explainable machine learning (ML) to predict life expectancy of a population based on the microbiome of its sewage while providing a description of AMR contribution to the prediction. We propose that in future, performing such analyses in 'real-time' would allow us to assess the potential risk to the population based on changes in the AMR profile of the community.
我们展示了一种用于促进宏基因组测序读数分析的内存计算范式的概念验证实现。在此过程中,我们比较了POSIX™文件系统和用于组学数据的键值存储的性能,并展示了集成高性能计算(HPC)和云原生技术的潜力。我们表明,内存键值存储通过更灵活、更快的数据处理为改进组学数据处理提供了可能性。我们设想了完全容器化的工作流程及其在便携式微管道中的部署,多个实例可与同一分布式内存存储并发工作。为了突出该技术在事件驱动和实时数据处理方面的潜在用途,我们使用了一个关注抗菌药物耐药性(AMR)日益增长威胁的生物学案例研究。我们开发了一个包含生物信息学和可解释机器学习(ML)的工作流程,以根据污水微生物群预测人群的预期寿命,同时描述AMR对预测的贡献。我们提出,未来进行此类“实时”分析将使我们能够根据社区AMR谱的变化评估人群面临的潜在风险。