Pacific Biosciences, Menlo Park, California 94025, USA.
Nat Rev Genet. 2010 Sep;11(9):647-57. doi: 10.1038/nrg2857.
Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist - such as cloud and heterogeneous computing - to successfully tackle our big data problems.
如今,我们每周可以在不到 5000 美元的成本下生成数百千兆字节的 DNA 和 RNA 测序数据。这些低成本、高通量技术在基因组学方面产生数据的惊人速度正在与其他技术相匹配,例如实时成像和基于质谱的流式细胞术。生命科学的成功将取决于我们正确解释这些技术生成的大规模、高维数据集的能力,而这反过来又要求我们采用信息学的进步。在这里,我们讨论如何掌握不同类型的计算环境 - 例如云和异构计算 - 以成功解决我们的大数据问题。