大数据也需要大理论。

Big data need big theory too.

作者信息

Coveney Peter V, Dougherty Edward R, Highfield Roger R

机构信息

Centre for Computational Science, University College London, Gordon Street, London WC1H 0AJ, UK

Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843-31283, USA.

出版信息

Philos Trans A Math Phys Eng Sci. 2016 Nov 13;374(2080). doi: 10.1098/rsta.2016.0153.

DOI:10.1098/rsta.2016.0153

PMID:27698035

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5052735/

Abstract

The current interest in big data, machine learning and data analytics has generated the widespread impression that such methods are capable of solving most problems without the need for conventional scientific methods of inquiry. Interest in these methods is intensifying, accelerated by the ease with which digitized data can be acquired in virtually all fields of endeavour, from science, healthcare and cybersecurity to economics, social sciences and the humanities. In multiscale modelling, machine learning appears to provide a shortcut to reveal correlations of arbitrary complexity between processes at the atomic, molecular, meso- and macroscales. Here, we point out the weaknesses of pure big data approaches with particular focus on biology and medicine, which fail to provide conceptual accounts for the processes to which they are applied. No matter their 'depth' and the sophistication of data-driven methods, such as artificial neural nets, in the end they merely fit curves to existing data. Not only do these methods invariably require far larger quantities of data than anticipated by big data aficionados in order to produce statistically reliable results, but they can also fail in circumstances beyond the range of the data used to train them because they are not designed to model the structural characteristics of the underlying system. We argue that it is vital to use theory as a guide to experimental design for maximal efficiency of data collection and to produce reliable predictive models and conceptual knowledge. Rather than continuing to fund, pursue and promote 'blind' big data projects with massive budgets, we call for more funding to be allocated to the elucidation of the multiscale and stochastic processes controlling the behaviour of complex systems, including those of life, medicine and healthcare.This article is part of the themed issue 'Multiscale modelling at the physics-chemistry-biology interface'.

摘要

当前对大数据、机器学习和数据分析的关注产生了一种普遍印象，即这些方法能够解决大多数问题，而无需传统的科学探究方法。对这些方法的兴趣正在增强，这得益于在几乎所有领域（从科学、医疗保健和网络安全到经济学、社会科学和人文学科）获取数字化数据的便捷性。在多尺度建模中，机器学习似乎提供了一条捷径，以揭示原子、分子、介观和宏观尺度上过程之间任意复杂程度的相关性。在这里，我们指出了纯大数据方法的弱点，特别关注生物学和医学领域，这些方法未能为其应用的过程提供概念性解释。无论它们的数据“深度”以及数据驱动方法（如人工神经网络）的复杂程度如何，最终它们只是对现有数据进行曲线拟合。这些方法不仅总是需要比大数据爱好者预期的多得多的数据量才能产生统计上可靠的结果，而且在超出用于训练它们的数据范围的情况下也可能失败，因为它们并非设计用于对基础系统的结构特征进行建模。我们认为，以理论为指导进行实验设计对于实现数据收集的最大效率以及产生可靠的预测模型和概念性知识至关重要。与其继续资助、开展和推广预算庞大的“盲目”大数据项目，我们呼吁将更多资金分配用于阐明控制复杂系统（包括生命、医学和医疗保健系统）行为的多尺度和随机过程。本文是主题为“物理 - 化学 - 生物学界面的多尺度建模”特刊的一部分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5332/5052735/f79555e9de5f/rsta20160153-g1.jpg

相似文献

Big data need big theory too.

Philos Trans A Math Phys Eng Sci. 2016 Nov 13;374(2080). doi: 10.1098/rsta.2016.0153.

Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).

Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.

The application of Big Data in medicine: current implications and future directions.

J Interv Card Electrophysiol. 2016 Oct;47(1):51-59. doi: 10.1007/s10840-016-0104-y. Epub 2016 Jan 27.

Big data in medical science--a biostatistical view.

Dtsch Arztebl Int. 2015 Feb 27;112(9):137-42. doi: 10.3238/arztebl.2015.0137.

[Big data in medicine and healthcare].

Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2015 Aug;58(8):794-798. doi: 10.1007/s00103-015-2181-y.

A primer on theory-driven web scraping: Automatic extraction of big data from the Internet for use in psychological research.

Psychol Methods. 2016 Dec;21(4):475-492. doi: 10.1037/met0000081. Epub 2016 May 23.

A perspective on bridging scales and design of models using low-dimensional manifolds and data-driven model inference.

Philos Trans A Math Phys Eng Sci. 2016 Nov 13;374(2080). doi: 10.1098/rsta.2016.0144.

Qualitative Study

The future of Cochrane Neonatal.

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

[Algorithms, machine intelligence, big data : general considerations].

Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2015 Aug;58(8):859-865. doi: 10.1007/s00103-015-2189-3.

引用本文的文献

Conformal prediction for uncertainty quantification in dynamic biological systems.

PLoS Comput Biol. 2025 May 12;21(5):e1013098. doi: 10.1371/journal.pcbi.1013098. eCollection 2025 May.

MRI-based digital twins to improve treatment response of breast cancer by optimizing neoadjuvant chemotherapy regimens.

NPJ Digit Med. 2025 Apr 7;8(1):195. doi: 10.1038/s41746-025-01579-1.

Striking the balance: Complexity, simplicity, and credibility in mathematical biology.

Proc Natl Acad Sci U S A. 2025 Apr;122(13):e2504067122. doi: 10.1073/pnas.2504067122. Epub 2025 Mar 24.

"Energetics of the outer retina I: Estimates of nutrient exchange and ATP generation".

PLoS One. 2024 Dec 31;19(12):e0312260. doi: 10.1371/journal.pone.0312260. eCollection 2024.

Artificial Intelligence Must Be Made More Scientific.

J Chem Inf Model. 2024 Aug 12;64(15):5739-5741. doi: 10.1021/acs.jcim.4c01091. Epub 2024 Jul 27.

Knowledge-driven learning, optimization, and experimental design under uncertainty for materials discovery.

Patterns (N Y). 2023 Nov 10;4(11):100863. doi: 10.1016/j.patter.2023.100863.

Theory, evidence, policy, and practice of medicine.

Med J Armed Forces India. 2023 Sep-Oct;79(5):606-607. doi: 10.1016/j.mjafi.2021.06.015. Epub 2021 Aug 26.

Is Cancer Reversible? Rethinking Carcinogenesis Models-A New Epistemological Tool.

Biomolecules. 2023 Apr 24;13(5):733. doi: 10.3390/biom13050733.

New horizons in the sociology of sport.

Front Sports Act Living. 2023 Feb 10;4:1060622. doi: 10.3389/fspor.2022.1060622. eCollection 2022.

More than meets the AI: The possibilities and limits of machine learning in olfaction.

Front Neurosci. 2022 Sep 1;16:981294. doi: 10.3389/fnins.2022.981294. eCollection 2022.

本文引用的文献

On the calculation of equilibrium thermodynamic properties from molecular dynamics.

Phys Chem Chem Phys. 2016 Nov 9;18(44):30236-30240. doi: 10.1039/c6cp02349e.

Machine-learning-assisted materials discovery using failed experiments.

Nature. 2016 May 5;533(7601):73-6. doi: 10.1038/nature17439.

Accelerated search for materials with targeted properties by adaptive design.

Nat Commun. 2016 Apr 15;7:11241. doi: 10.1038/ncomms11241.

Reproducibility in density functional theory calculations of solids.

Science. 2016 Mar 25;351(6280):aad3000. doi: 10.1126/science.aad3000.

Metabolic Electron Attachment as a Primary Mechanism For Toxicity Potentials of Halocarbons.

Curr Comput Aided Drug Des. 2016;12(1):62-72. doi: 10.2174/1573409912666160120151627.

Mechanism of Exfoliation and Prediction of Materials Properties of Clay-Polymer Nanocomposites from Multiscale Modeling.

Nano Lett. 2015 Dec 9;15(12):8108-13. doi: 10.1021/acs.nanolett.5b03547. Epub 2015 Nov 24.

MEDICINE. Personalization in practice.

Science. 2015 Oct 16;350(6258):282-3. doi: 10.1126/science.aad5204.

Optimal Experimental Design for Gene Regulatory Networks in the Presence of Uncertainty.

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):938-50. doi: 10.1109/TCBB.2014.2377733.

Chemically specific multiscale modeling of clay-polymer nanocomposites reveals intercalation dynamics, tactoid self-assembly and emergent materials properties.

Adv Mater. 2015 Feb;27(6):966-84. doi: 10.1002/adma.201403361. Epub 2014 Dec 9.

Medicine. Big data meets public health.

Science. 2014 Nov 28;346(6213):1054-5. doi: 10.1126/science.aaa2709.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

大数据也需要大理论。

Big data need big theory too.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献