• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因组学数据生成与应用之间日益扩大的差距:大数据传输技术实用指南

The Widening Gulf between Genomics Data Generation and Consumption: A Practical Guide to Big Data Transfer Technology.

作者信息

Feltus Frank A, Breen Joseph R, Deng Juan, Izard Ryan S, Konger Christopher A, Ligon Walter B, Preuss Don, Wang Kuang-Ching

机构信息

Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA.

University of Utah Center for High Performance Computing, Salt Lake City, UT, USA.

出版信息

Bioinform Biol Insights. 2015 Sep 23;9(Suppl 1):9-19. doi: 10.4137/BBI.S28988. eCollection 2015.

DOI:10.4137/BBI.S28988
PMID:26568680
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4636112/
Abstract

In the last decade, high-throughput DNA sequencing has become a disruptive technology and pushed the life sciences into a distributed ecosystem of sequence data producers and consumers. Given the power of genomics and declining sequencing costs, biology is an emerging "Big Data" discipline that will soon enter the exabyte data range when all subdisciplines are combined. These datasets must be transferred across commercial and research networks in creative ways since sending data without thought can have serious consequences on data processing time frames. Thus, it is imperative that biologists, bioinformaticians, and information technology engineers recalibrate data processing paradigms to fit this emerging reality. This review attempts to provide a snapshot of Big Data transfer across networks, which is often overlooked by many biologists. Specifically, we discuss four key areas: 1) data transfer networks, protocols, and applications; 2) data transfer security including encryption, access, firewalls, and the Science DMZ; 3) data flow control with software-defined networking; and 4) data storage, staging, archiving and access. A primary intention of this article is to orient the biologist in key aspects of the data transfer process in order to frame their genomics-oriented needs to enterprise IT professionals.

摘要

在过去十年中,高通量DNA测序已成为一项颠覆性技术,并将生命科学推向了一个由序列数据生产者和消费者组成的分布式生态系统。鉴于基因组学的强大力量和测序成本的下降,生物学正在成为一门新兴的“大数据”学科,当所有子学科的数据合并在一起时,很快将进入艾字节数据范围。由于不加思考地传输数据可能会对数据处理时间框架产生严重影响,因此这些数据集必须以创造性的方式通过商业和研究网络进行传输。因此,生物学家、生物信息学家和信息技术工程师必须重新调整数据处理范式,以适应这一新兴现实。本综述试图提供一幅网络间大数据传输的快照,而这往往被许多生物学家所忽视。具体而言,我们将讨论四个关键领域:1)数据传输网络、协议和应用程序;2)数据传输安全,包括加密、访问、防火墙和科学非军事区;3)通过软件定义网络进行的数据流控制;以及4)数据存储、暂存、存档和访问。本文的主要目的是让生物学家了解数据传输过程的关键方面,以便向企业IT专业人员阐述他们以基因组学为导向的需求。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/442f/4636112/a2f304ff407c/bbi-suppl.1-2015-009f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/442f/4636112/ff352c3b0b2b/bbi-suppl.1-2015-009f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/442f/4636112/9e9699dc30e0/bbi-suppl.1-2015-009f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/442f/4636112/72e01be7e4e6/bbi-suppl.1-2015-009f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/442f/4636112/06c3c2a3b631/bbi-suppl.1-2015-009f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/442f/4636112/a2f304ff407c/bbi-suppl.1-2015-009f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/442f/4636112/ff352c3b0b2b/bbi-suppl.1-2015-009f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/442f/4636112/9e9699dc30e0/bbi-suppl.1-2015-009f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/442f/4636112/72e01be7e4e6/bbi-suppl.1-2015-009f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/442f/4636112/06c3c2a3b631/bbi-suppl.1-2015-009f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/442f/4636112/a2f304ff407c/bbi-suppl.1-2015-009f5.jpg

相似文献

1
The Widening Gulf between Genomics Data Generation and Consumption: A Practical Guide to Big Data Transfer Technology.基因组学数据生成与应用之间日益扩大的差距:大数据传输技术实用指南
Bioinform Biol Insights. 2015 Sep 23;9(Suppl 1):9-19. doi: 10.4137/BBI.S28988. eCollection 2015.
2
Bioinformatics software for biologists in the genomics era.基因组学时代面向生物学家的生物信息学软件。
Bioinformatics. 2007 Jul 15;23(14):1713-7. doi: 10.1093/bioinformatics/btm239. Epub 2007 May 7.
3
Machine learning for Big Data analytics in plants.植物大数据分析的机器学习。
Trends Plant Sci. 2014 Dec;19(12):798-808. doi: 10.1016/j.tplants.2014.08.004. Epub 2014 Sep 12.
4
Lessons learnt on the analysis of large sequence data in animal genomics.动物基因组学中大型序列数据分析的经验教训。
Anim Genet. 2018 Jun;49(3):147-158. doi: 10.1111/age.12655. Epub 2018 Apr 6.
5
The Medical Science DMZ.医学科学非军事区
J Am Med Inform Assoc. 2016 Nov;23(6):1199-1201. doi: 10.1093/jamia/ocw032. Epub 2016 May 2.
6
Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies.构建下一代生物信息学解决方案以管理和分析由新一代测序技术产生的生物大数据的信息技术创新趋势。
Biomed Res Int. 2015;2015:904541. doi: 10.1155/2015/904541. Epub 2015 Jun 1.
7
Efficient and Secure Privacy Analysis for Medical Big Data Using TDES and MKSVM with Access Control in Cloud.基于 TDES 和 MKSVM 以及云访问控制的医疗大数据高效安全隐私分析
J Med Syst. 2019 Jul 4;43(8):265. doi: 10.1007/s10916-019-1374-6.
8
Big Data in Plant Science: Resources and Data Mining Tools for Plant Genomics and Proteomics.植物科学中的大数据:植物基因组学和蛋白质组学的资源与数据挖掘工具
Methods Mol Biol. 2016;1415:533-47. doi: 10.1007/978-1-4939-3572-7_27.
9
Security Issues for Mobile Medical Imaging: A Primer.移动医学成像的安全问题:入门指南。
Radiographics. 2015 Oct;35(6):1814-24. doi: 10.1148/rg.2015140039.
10
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

引用本文的文献

1
Regulating appetite in broilers for improving body and muscle development - A review.调控肉鸡食欲以改善胴体和肌肉发育——综述。
J Anim Physiol Anim Nutr (Berl). 2020 Nov;104(6):1819-1834. doi: 10.1111/jpn.13407. Epub 2020 Jun 26.
2
The Importance of Endophenotypes to Evaluate the Relationship between Genotype and External Phenotype.内表型在评估基因型与外在表型关系中的重要性。
Int J Mol Sci. 2017 Feb 22;18(2):472. doi: 10.3390/ijms18020472.
3
Harnessing Big Data for Systems Pharmacology.利用大数据进行系统药理学研究。

本文引用的文献

1
Comparison of tamoxifen and letrozole response in mammary preneoplasia of ER and aromatase overexpressing mice defines an immune-associated gene signature linked to tamoxifen resistance.比较 ER 和芳香化酶过表达小鼠乳腺前期病变中他莫昔芬和来曲唑的反应,定义与他莫昔芬耐药相关的免疫相关基因特征。
Carcinogenesis. 2015 Jan;36(1):122-32. doi: 10.1093/carcin/bgu237. Epub 2014 Nov 23.
2
The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data.癌症基因组学中心(CGHub):借助海量数据的力量战胜癌症。
Database (Oxford). 2014 Sep 29;2014. doi: 10.1093/database/bau093. Print 2014.
3
The coffee genome provides insight into the convergent evolution of caffeine biosynthesis.
Annu Rev Pharmacol Toxicol. 2017 Jan 6;57:245-262. doi: 10.1146/annurev-pharmtox-010716-104659. Epub 2016 Oct 13.
4
Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users.大数据智能插座(BDSS):一种从终端用户提取数据传输习惯的系统。
Bioinformatics. 2017 Feb 15;33(4):627-628. doi: 10.1093/bioinformatics/btw679.
5
Preface - Access to Knowledge Revisited.前言——重新审视知识获取
Yearb Med Inform. 2016 May 20;Suppl 1(Suppl 1):S18-20. doi: 10.15265/IYS-2016-s026.
咖啡基因组为咖啡因生物合成的趋同进化提供了线索。
Science. 2014 Sep 5;345(6201):1181-4. doi: 10.1126/science.1255274. Epub 2014 Sep 4.
4
Leveraging the national cyberinfrastructure for biomedical research.利用国家网络基础设施进行生物医学研究。
J Am Med Inform Assoc. 2014 Mar-Apr;21(2):195-9. doi: 10.1136/amiajnl-2013-002059. Epub 2013 Aug 20.
5
Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data.从瑞典实施下一代测序数据存储和分析的国家基础设施中吸取的经验教训。
Gigascience. 2013 Jun 25;2(1):9. doi: 10.1186/2047-217X-2-9.
6
A new boson with a mass of 125 GeV observed with the CMS experiment at the Large Hadron Collider.CMS 实验在大型强子对撞机上观测到质量为 125GeV 的新玻色子。
Science. 2012 Dec 21;338(6114):1569-75. doi: 10.1126/science.1230816.
7
Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy.将云计算基础设施与CloudBioLinux、CloudMan和Galaxy一起使用。
Curr Protoc Bioinformatics. 2012 Jun;Chapter 11:11.9.1-11.9.20. doi: 10.1002/0471250953.bi1109s38.
8
The iPlant Collaborative: Cyberinfrastructure for Plant Biology.i 植物协作组:植物生物学的网络基础设施。
Front Plant Sci. 2011 Jul 25;2:34. doi: 10.3389/fpls.2011.00034. eCollection 2011.
9
ELIXIR: a distributed infrastructure for European biological data.ELIXIR:欧洲生物数据的分布式基础设施。
Trends Biotechnol. 2012 May;30(5):241-2. doi: 10.1016/j.tibtech.2012.02.002. Epub 2012 Mar 12.
10
The Sequence Read Archive: explosive growth of sequencing data.序列读取档案:测序数据的爆炸式增长。
Nucleic Acids Res. 2012 Jan;40(Database issue):D54-6. doi: 10.1093/nar/gkr854. Epub 2011 Oct 18.