• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

组学工作流程的可扩展内存处理。

Scalable in-memory processing of omics workflows.

作者信息

Elisseev Vadim, Gardiner Laura-Jayne, Krishna Ritesh

机构信息

IBM Research Europe, Hartree Centre, Daresbury Laboratory, Keckwick Lane, WarringtonWA4 4AD, Cheshire, UK.

Wrexham Glyndwr University, Mold Rd, Wrexham LL11 2AW, Wales, UK.

出版信息

Comput Struct Biotechnol J. 2022 Apr 20;20:1914-1924. doi: 10.1016/j.csbj.2022.04.014. eCollection 2022.

DOI:10.1016/j.csbj.2022.04.014
PMID:35521547
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9052061/
Abstract

We present a proof of concept implementation of the in-memory computing paradigm that we use to facilitate the analysis of metagenomic sequencing reads. In doing so we compare the performance of POSIX™file systems and key-value storage for omics data, and we show the potential for integrating high-performance computing (HPC) and cloud native technologies. We show that in-memory key-value storage offers possibilities for improved handling of omics data through more flexible and faster data processing. We envision fully containerized workflows and their deployment in portable micro-pipelines with multiple instances working concurrently with the same distributed in-memory storage. To highlight the potential usage of this technology for event driven and real-time data processing, we use a biological case study focused on the growing threat of antimicrobial resistance (AMR). We develop a workflow encompassing bioinformatics and explainable machine learning (ML) to predict life expectancy of a population based on the microbiome of its sewage while providing a description of AMR contribution to the prediction. We propose that in future, performing such analyses in 'real-time' would allow us to assess the potential risk to the population based on changes in the AMR profile of the community.

摘要

我们展示了一种用于促进宏基因组测序读数分析的内存计算范式的概念验证实现。在此过程中,我们比较了POSIX™文件系统和用于组学数据的键值存储的性能,并展示了集成高性能计算(HPC)和云原生技术的潜力。我们表明,内存键值存储通过更灵活、更快的数据处理为改进组学数据处理提供了可能性。我们设想了完全容器化的工作流程及其在便携式微管道中的部署,多个实例可与同一分布式内存存储并发工作。为了突出该技术在事件驱动和实时数据处理方面的潜在用途,我们使用了一个关注抗菌药物耐药性(AMR)日益增长威胁的生物学案例研究。我们开发了一个包含生物信息学和可解释机器学习(ML)的工作流程,以根据污水微生物群预测人群的预期寿命,同时描述AMR对预测的贡献。我们提出,未来进行此类“实时”分析将使我们能够根据社区AMR谱的变化评估人群面临的潜在风险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/75dc76a75584/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/62b60adc18b3/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/e25731980239/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/526d0ae8bae0/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/43e933a509c7/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/9ee5f14d3001/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/5329e7012c26/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/44a1b649fbdf/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/8298409664f4/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/8468e2b6607e/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/d9bea6e1fe71/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/75dc76a75584/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/62b60adc18b3/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/e25731980239/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/526d0ae8bae0/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/43e933a509c7/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/9ee5f14d3001/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/5329e7012c26/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/44a1b649fbdf/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/8298409664f4/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/8468e2b6607e/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/d9bea6e1fe71/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f025/9052061/75dc76a75584/gr10.jpg

相似文献

1
Scalable in-memory processing of omics workflows.组学工作流程的可扩展内存处理。
Comput Struct Biotechnol J. 2022 Apr 20;20:1914-1924. doi: 10.1016/j.csbj.2022.04.014. eCollection 2022.
2
Integrating the BIDS Neuroimaging Data Format and Workflow Optimization for Large-Scale Medical Image Analysis.将 BIDS 神经影像学数据格式与工作流程优化相结合,以进行大规模医学图像分析。
J Digit Imaging. 2022 Dec;35(6):1576-1589. doi: 10.1007/s10278-022-00679-8. Epub 2022 Aug 3.
3
Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.Tavaxy:集成 Taverna 和 Galaxy 工作流并提供云计算支持。
BMC Bioinformatics. 2012 May 4;13:77. doi: 10.1186/1471-2105-13-77.
4
Integration of Data and Phenotypic Data Within a Unified Extensible Multimodal Framework.在统一的可扩展多模态框架内整合数据与表型数据。
Front Neuroinform. 2018 Dec 18;12:91. doi: 10.3389/fninf.2018.00091. eCollection 2018.
5
A graphical, interactive and GPU-enabled workflow to process long-read sequencing data.一种图形化、交互式且支持 GPU 的工作流程,用于处理长读测序数据。
BMC Genomics. 2021 Aug 23;22(1):626. doi: 10.1186/s12864-021-07927-1.
6
Serverless computing in omics data analysis and integration.无服务器计算在组学数据分析和整合中的应用。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab349.
7
CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment.CLUSTOM-CLOUD:用于在云环境中对16S rRNA序列数据进行聚类的基于内存数据网格的软件。
PLoS One. 2016 Mar 8;11(3):e0151064. doi: 10.1371/journal.pone.0151064. eCollection 2016.
8
Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework.使用 Apache Arrow 内存数据框架优化 GATK 工作流程的性能。
BMC Genomics. 2020 Nov 18;21(Suppl 10):683. doi: 10.1186/s12864-020-07013-y.
9
Performance Management of High Performance Computing for Medical Image Processing in Amazon Web Services.亚马逊网络服务中用于医学图像处理的高性能计算的绩效管理
Proc SPIE Int Soc Opt Eng. 2016 Feb 27;9789. doi: 10.1117/12.2217396. Epub 2016 Mar 25.
10
NanoSPC: a scalable, portable, cloud compatible viral nanopore metagenomic data processing pipeline.NanoSPC:一种可扩展、便携、与云兼容的病毒纳米孔宏基因组数据处理管道。
Nucleic Acids Res. 2020 Jul 2;48(W1):W366-W371. doi: 10.1093/nar/gkaa413.

本文引用的文献

1
User-centric genomics infrastructure: trends and technologies.以用户为中心的基因组学基础设施:趋势与技术。
Genome. 2021 Apr;64(4):467-475. doi: 10.1139/gen-2020-0096. Epub 2020 Nov 20.
2
MEGARes 2.0: a database for classification of antimicrobial drug, biocide and metal resistance determinants in metagenomic sequence data.MEGARes 2.0:一个用于分类宏基因组序列数据中抗菌药物、杀生物剂和金属抗性决定因子的数据库。
Nucleic Acids Res. 2020 Jan 8;48(D1):D561-D569. doi: 10.1093/nar/gkz1010.
3
Scaling computational genomics to millions of individuals with GPUs.
使用 GPU 对数百万人进行计算基因组学研究。
Genome Biol. 2019 Nov 1;20(1):228. doi: 10.1186/s13059-019-1836-7.
4
ORCA: a comprehensive bioinformatics container environment for education and research.ORCA:一个全面的生物信息学容器环境,用于教育和研究。
Bioinformatics. 2019 Nov 1;35(21):4448-4450. doi: 10.1093/bioinformatics/btz278.
5
Global monitoring of antimicrobial resistance based on metagenomics analyses of urban sewage.基于城市污水宏基因组分析的抗菌药物耐药性全球监测。
Nat Commun. 2019 Mar 8;10(1):1124. doi: 10.1038/s41467-019-08853-3.
6
Container-based bioinformatics with Pachyderm.基于容器的生物信息学与 Pachyderm。
Bioinformatics. 2019 Mar 1;35(5):839-846. doi: 10.1093/bioinformatics/bty699.
7
SparkBLAST: scalable BLAST processing using in-memory operations.SparkBLAST:使用内存操作的可扩展BLAST处理
BMC Bioinformatics. 2017 Jun 27;18(1):318. doi: 10.1186/s12859-017-1723-8.
8
VSEARCH: a versatile open source tool for metagenomics.VSEARCH:一款用于宏基因组学的多功能开源工具。
PeerJ. 2016 Oct 18;4:e2584. doi: 10.7717/peerj.2584. eCollection 2016.
9
The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences.iPlant协作组织:用于推动生命科学领域从数据到发现的网络基础设施。
PLoS Biol. 2016 Jan 11;14(1):e1002342. doi: 10.1371/journal.pbio.1002342. eCollection 2016 Jan.
10
Trimmomatic: a flexible trimmer for Illumina sequence data.Trimmomatic:一款适用于 Illumina 测序数据的灵活修剪工具。
Bioinformatics. 2014 Aug 1;30(15):2114-20. doi: 10.1093/bioinformatics/btu170. Epub 2014 Apr 1.