Suppr超能文献

大规模数据管理和分析的计算解决方案。

Computational solutions to large-scale data management and analysis.

机构信息

Pacific Biosciences, Menlo Park, California 94025, USA.

出版信息

Nat Rev Genet. 2010 Sep;11(9):647-57. doi: 10.1038/nrg2857.

Abstract

Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist - such as cloud and heterogeneous computing - to successfully tackle our big data problems.

摘要

如今,我们每周可以在不到 5000 美元的成本下生成数百千兆字节的 DNA 和 RNA 测序数据。这些低成本、高通量技术在基因组学方面产生数据的惊人速度正在与其他技术相匹配,例如实时成像和基于质谱的流式细胞术。生命科学的成功将取决于我们正确解释这些技术生成的大规模、高维数据集的能力,而这反过来又要求我们采用信息学的进步。在这里,我们讨论如何掌握不同类型的计算环境 - 例如云和异构计算 - 以成功解决我们的大数据问题。

相似文献

1
Computational solutions to large-scale data management and analysis.
Nat Rev Genet. 2010 Sep;11(9):647-57. doi: 10.1038/nrg2857.
2
Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing.
Brief Bioinform. 2010 Mar;11(2):181-97. doi: 10.1093/bib/bbp046. Epub 2009 Oct 27.
3
User-centric genomics infrastructure: trends and technologies.
Genome. 2021 Apr;64(4):467-475. doi: 10.1139/gen-2020-0096. Epub 2020 Nov 20.
4
A System Architecture for Efficient Transmission of Massive DNA Sequencing Data.
J Comput Biol. 2017 Nov;24(11):1081-1088. doi: 10.1089/cmb.2017.0016. Epub 2017 Apr 17.
5
Computational solutions for omics data.
Nat Rev Genet. 2013 May;14(5):333-46. doi: 10.1038/nrg3433.
6
Computational methods for discovering structural variation with next-generation sequencing.
Nat Methods. 2009 Nov;6(11 Suppl):S13-20. doi: 10.1038/nmeth.1374.
7
CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.
PLoS One. 2014 Jun 4;9(6):e98146. doi: 10.1371/journal.pone.0098146. eCollection 2014.
8
Current state-of-art of sequencing technologies for plant genomics research.
Brief Funct Genomics. 2012 Jan;11(1):3-11. doi: 10.1093/bfgp/elr045.
10
Lessons learnt on the analysis of large sequence data in animal genomics.
Anim Genet. 2018 Jun;49(3):147-158. doi: 10.1111/age.12655. Epub 2018 Apr 6.

引用本文的文献

1
Advancing the Spatiotemporal Dimension of Wildlife-Pollution Interactions.
Environ Sci Technol Lett. 2025 Mar 18;12(4):358-370. doi: 10.1021/acs.estlett.5c00042. eCollection 2025 Apr 8.
2
VAREANT: a bioinformatics application for gene variant reduction and annotation.
Bioinform Adv. 2024 Dec 31;5(1):vbae210. doi: 10.1093/bioadv/vbae210. eCollection 2025.
4
The Synergy of Machine Learning and Epidemiology in Addressing Carbapenem Resistance: A Comprehensive Review.
Antibiotics (Basel). 2024 Oct 21;13(10):996. doi: 10.3390/antibiotics13100996.
5
Deep learning in bioinformatics.
Turk J Biol. 2023 Dec 18;47(6):366-382. doi: 10.55730/1300-0152.2671. eCollection 2023.
7
Medicare meets the cloud: the development of a secure platform for the storage and analysis of claims data.
JAMIA Open. 2024 Feb 9;7(1):ooae007. doi: 10.1093/jamiaopen/ooae007. eCollection 2024 Apr.
8
Perovskite single-pixel detector for dual-color metasurface imaging recognition in complex environment.
Light Sci Appl. 2023 Nov 27;12(1):286. doi: 10.1038/s41377-023-01311-2.
9
Journeying towards best practice data management in biodiversity genomics.
Mol Ecol Resour. 2025 Feb;25(2):e13880. doi: 10.1111/1755-0998.13880. Epub 2023 Oct 24.
10
Container Profiler: Profiling resource utilization of containerized big data pipelines.
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad069. Epub 2023 Aug 25.

本文引用的文献

1
High-throughput Bayesian Network Learning using Heterogeneous Multicore Computers.
ICS. 2010 Jun;2010:95-104. doi: 10.1145/1810085.1810101.
2
Third-generation sequencing fireworks at Marco Island.
Nat Biotechnol. 2010 May;28(5):426-8. doi: 10.1038/nbt0510-426.
3
Direct detection of DNA methylation during single-molecule, real-time sequencing.
Nat Methods. 2010 Jun;7(6):461-5. doi: 10.1038/nmeth.1459. Epub 2010 May 9.
4
Direct sequencing of the human microbiome readily reveals community differences.
Genome Biol. 2010;11(5):210. doi: 10.1186/gb-2010-11-5-210. Epub 2010 May 5.
5
A human gut microbial gene catalogue established by metagenomic sequencing.
Nature. 2010 Mar 4;464(7285):59-65. doi: 10.1038/nature08821.
6
VertNet: a new model for biodiversity data sharing.
PLoS Biol. 2010 Feb 16;8(2):e1000309. doi: 10.1371/journal.pbio.1000309.
7
A Bayesian partition method for detecting pleiotropic and epistatic eQTL modules.
PLoS Comput Biol. 2010 Jan 15;6(1):e1000642. doi: 10.1371/journal.pcbi.1000642.
8
Up in a cloud?
Nat Biotechnol. 2010 Jan;28(1):13-5. doi: 10.1038/nbt0110-13.
9
Searching for SNPs with cloud computing.
Genome Biol. 2009;10(11):R134. doi: 10.1186/gb-2009-10-11-r134. Epub 2009 Nov 20.
10
Bacterial community variation in human body habitats across space and time.
Science. 2009 Dec 18;326(5960):1694-7. doi: 10.1126/science.1177486. Epub 2009 Nov 5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验