• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于云计算的分布式基因临床决策支持系统

Distributed gene clinical decision support system based on cloud computing.

作者信息

Xu Bo, Li Changlong, Zhuang Hang, Wang Jiali, Wang Qingfeng, Wang Chao, Zhou Xuehai

机构信息

School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230027, China.

出版信息

BMC Med Genomics. 2018 Nov 20;11(Suppl 5):100. doi: 10.1186/s12920-018-0415-1.

DOI:10.1186/s12920-018-0415-1
PMID:30454054
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6245588/
Abstract

BACKGROUND

The clinical decision support system can effectively break the limitations of doctors' knowledge and reduce the possibility of misdiagnosis to enhance health care. The traditional genetic data storage and analysis methods based on stand-alone environment are hard to meet the computational requirements with the rapid genetic data growth for the limited scalability.

METHODS

In this paper, we propose a distributed gene clinical decision support system, which is named GCDSS. And a prototype is implemented based on cloud computing technology. At the same time, we present CloudBWA which is a novel distributed read mapping algorithm leveraging batch processing strategy to map reads on Apache Spark.

RESULTS

Experiments show that the distributed gene clinical decision support system GCDSS and the distributed read mapping algorithm CloudBWA have outstanding performance and excellent scalability. Compared with state-of-the-art distributed algorithms, CloudBWA achieves up to 2.63 times speedup over SparkBWA. Compared with stand-alone algorithms, CloudBWA with 16 cores achieves up to 11.59 times speedup over BWA-MEM with 1 core.

CONCLUSIONS

GCDSS is a distributed gene clinical decision support system based on cloud computing techniques. In particular, we incorporated a distributed genetic data analysis pipeline framework in the proposed GCDSS system. To boost the data processing of GCDSS, we propose CloudBWA, which is a novel distributed read mapping algorithm to leverage batch processing technique in mapping stage using Apache Spark platform.

摘要

背景

临床决策支持系统能够有效突破医生知识的局限,降低误诊可能性,从而改善医疗保健。基于单机环境的传统基因数据存储和分析方法,难以满足因基因数据快速增长而对扩展性要求有限的计算需求。

方法

本文提出了一种分布式基因临床决策支持系统,名为GCDSS。并基于云计算技术实现了一个原型。同时,我们提出了CloudBWA,这是一种新颖的分布式读段比对算法,利用批处理策略在Apache Spark上进行读段比对。

结果

实验表明,分布式基因临床决策支持系统GCDSS和分布式读段比对算法CloudBWA具有出色的性能和卓越的扩展性。与最先进的分布式算法相比,CloudBWA比SparkBWA的加速比高达2.63倍。与单机算法相比,具有16个核心的CloudBWA比具有1个核心的BWA-MEM的加速比高达11.59倍。

结论

GCDSS是一个基于云计算技术的分布式基因临床决策支持系统。特别是,我们在所提出的GCDSS系统中纳入了一个分布式遗传数据分析管道框架。为了提高GCDSS的数据处理能力,我们提出了CloudBWA,这是一种新颖的分布式读段比对算法,利用批处理技术在映射阶段使用Apache Spark平台。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/ed55197ce896/12920_2018_415_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/0b8b142c6dd6/12920_2018_415_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/0a42b9cb850d/12920_2018_415_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/2c7dff68031c/12920_2018_415_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/4ee933b39870/12920_2018_415_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/dbbeff618865/12920_2018_415_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/00c8cc67a280/12920_2018_415_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/ff273340bf2f/12920_2018_415_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/2db9b96385eb/12920_2018_415_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/a775bb8532b7/12920_2018_415_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/53c9043d0bcb/12920_2018_415_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/a01636387784/12920_2018_415_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/1bdce46bd96d/12920_2018_415_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/ed55197ce896/12920_2018_415_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/0b8b142c6dd6/12920_2018_415_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/0a42b9cb850d/12920_2018_415_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/2c7dff68031c/12920_2018_415_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/4ee933b39870/12920_2018_415_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/dbbeff618865/12920_2018_415_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/00c8cc67a280/12920_2018_415_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/ff273340bf2f/12920_2018_415_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/2db9b96385eb/12920_2018_415_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/a775bb8532b7/12920_2018_415_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/53c9043d0bcb/12920_2018_415_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/a01636387784/12920_2018_415_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/1bdce46bd96d/12920_2018_415_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94e9/6245588/ed55197ce896/12920_2018_415_Fig13_HTML.jpg

相似文献

1
Distributed gene clinical decision support system based on cloud computing.基于云计算的分布式基因临床决策支持系统
BMC Med Genomics. 2018 Nov 20;11(Suppl 5):100. doi: 10.1186/s12920-018-0415-1.
2
ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.ViraPipe:用于从下一代测序读取中进行病毒宏基因组分析的可扩展并行管道。
Bioinformatics. 2018 Mar 15;34(6):928-935. doi: 10.1093/bioinformatics/btx702.
3
SparkBLAST: scalable BLAST processing using in-memory operations.SparkBLAST:使用内存操作的可扩展BLAST处理
BMC Bioinformatics. 2017 Jun 27;18(1):318. doi: 10.1186/s12859-017-1723-8.
4
ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark.ADS-HCSpark:一种可扩展的基于 Spark 的单倍型调用程序,利用自适应数据分段来加速变异调用。
BMC Bioinformatics. 2019 Feb 14;20(1):76. doi: 10.1186/s12859-019-2665-0.
5
A Secure Alignment Algorithm for Mapping Short Reads to Human Genome.一种用于将短读段映射到人类基因组的安全比对算法。
J Comput Biol. 2018 Jun;25(6):529-540. doi: 10.1089/cmb.2017.0094. Epub 2018 May 9.
6
Long Read Alignment with Parallel MapReduce Cloud Platform.使用并行MapReduce云平台进行长读段比对
Biomed Res Int. 2015;2015:807407. doi: 10.1155/2015/807407. Epub 2015 Dec 29.
7
PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead.PipeMEM:一种在 Spark 中使用低开销加速 BWA-MEM 的框架。
Genes (Basel). 2019 Nov 4;10(11):886. doi: 10.3390/genes10110886.
8
Cloud-BS: A MapReduce-based bisulfite sequencing aligner on cloud.Cloud-BS:一种基于MapReduce的云端亚硫酸氢盐测序比对器。
J Bioinform Comput Biol. 2018 Dec;16(6):1840028. doi: 10.1142/S0219720018400280. Epub 2018 Oct 30.
9
DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework.DeepVariant-on-Spark:使用基于云的计算框架进行小规模基因组分析。
Comput Math Methods Med. 2020 Sep 1;2020:7231205. doi: 10.1155/2020/7231205. eCollection 2020.
10
CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment.CLUSTOM-CLOUD:用于在云环境中对16S rRNA序列数据进行聚类的基于内存数据网格的软件。
PLoS One. 2016 Mar 8;11(3):e0151064. doi: 10.1371/journal.pone.0151064. eCollection 2016.

引用本文的文献

1
Deep learning classification integrating embryo images with associated clinical information from ART cycles.将胚胎图像与辅助生殖周期中的相关临床信息相结合的深度学习分类
Sci Rep. 2025 May 21;15(1):17585. doi: 10.1038/s41598-025-02076-x.
2
Towards effective clinical decision support systems: A systematic review.迈向有效的临床决策支持系统:系统综述。
PLoS One. 2022 Aug 15;17(8):e0272846. doi: 10.1371/journal.pone.0272846. eCollection 2022.

本文引用的文献

1
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data.SparkBWA:加速高通量DNA测序数据比对
PLoS One. 2016 May 16;11(5):e0155461. doi: 10.1371/journal.pone.0155461. eCollection 2016.
2
The real cost of sequencing: scaling computation to keep pace with data generation.测序的实际成本:扩展计算能力以跟上数据生成的步伐。
Genome Biol. 2016 Mar 23;17:53. doi: 10.1186/s13059-016-0917-0.
3
Heterogeneous Cloud Framework for Big Data Genome Sequencing.用于大数据基因组测序的异构云框架
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):166-78. doi: 10.1109/TCBB.2014.2351800.
4
From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.从FastQ数据到高可信度变异检测:基因组分析工具包最佳实践流程
Curr Protoc Bioinformatics. 2013;43(1110):11.10.1-11.10.33. doi: 10.1002/0471250953.bi1110s43.
5
OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders.OMIM.org:《人类孟德尔遗传在线》(OMIM®),一个人类基因和遗传疾病的在线目录。
Nucleic Acids Res. 2015 Jan;43(Database issue):D789-98. doi: 10.1093/nar/gku1205. Epub 2014 Nov 26.
6
The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.基因组分析工具包:一种用于分析下一代 DNA 测序数据的 MapReduce 框架。
Genome Res. 2010 Sep;20(9):1297-303. doi: 10.1101/gr.107524.110. Epub 2010 Jul 19.
7
Fast and accurate long-read alignment with Burrows-Wheeler transform.基于 Burrows-Wheeler 变换的快速准确长读比对。
Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15.
8
The Sequence Alignment/Map format and SAMtools.序列比对/映射格式和 SAMtools。
Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.
9
Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。
Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.
10
A roadmap for national action on clinical decision support.国家临床决策支持行动路线图。
J Am Med Inform Assoc. 2007 Mar-Apr;14(2):141-5. doi: 10.1197/jamia.M2334. Epub 2007 Jan 9.