Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA.
Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA.
Genome Biol. 2019 May 29;20(1):109. doi: 10.1186/s13059-019-1724-1.
Data science allows the extraction of practical insights from large-scale data. Here, we contextualize it as an umbrella term, encompassing several disparate subdomains. We focus on how genomics fits as a specific application subdomain, in terms of well-known 3 V data and 4 M process frameworks (volume-velocity-variety and measurement-mining-modeling-manipulation, respectively). We further analyze the technical and cultural "exports" and "imports" between genomics and other data-science subdomains (e.g., astronomy). Finally, we discuss how data value, privacy, and ownership are pressing issues for data science applications, in general, and are especially relevant to genomics, due to the persistent nature of DNA.
数据科学可从大规模数据中提取实用见解。在这里,我们将其视为一个总称,涵盖了几个不同的子领域。我们专注于基因组学如何作为一个特定的应用子领域,根据知名的 3V 数据和 4M 处理框架(分别为体积-速度-多样性和测量-挖掘-建模-操作)。我们进一步分析了基因组学与其他数据科学子领域(例如天文学)之间的技术和文化“输出”和“输入”。最后,我们讨论了数据价值、隐私和所有权如何成为数据科学应用的紧迫问题,特别是由于 DNA 的持久性,这些问题与基因组学尤其相关。