Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein, Kiel, Germany.
Pfizer Deutschland GmbH, Berlin, Germany.
BMC Med Res Methodol. 2019 Jun 17;19(1):125. doi: 10.1186/s12874-019-0774-0.
Use of big data is becoming increasingly popular in medical research. Since big data-based projects differ notably from classical research studies, both in terms of scope and quality, a debate is apt as to whether big data require new approaches to scientific reasoning different from those established in statistics and philosophy of science.
The progressing digitalization of our societies generates vast amounts of data that also become available for medical research. Here, the big promise of big data is to facilitate major improvements in the treatment, diagnosis and prevention of diseases. An ongoing examination of the idiosyncrasies of big data is therefore essential to ensure that the field stays congruent with the principles of evidence-based medicine. We discuss the inherent challenges and opportunities of big data in medicine from a methodological point of view, particularly highlighting the relative importance of causality and correlation in commercial and medical research settings. We make a strong case for upholding the distinction between exploratory data analysis facilitating hypothesis generation and confirmatory approaches involving hypothesis validation. An independent verification of research results will be ever more important in the context of big data, where data quality is often hampered by a lack of standardization and structuring.
We argue that it would be both unnecessary and dangerous to discard long-established principles of data generation, analysis and interpretation in the age of big data. While many medical research areas may reasonably benefit from big data analyses, they should nevertheless be complemented by carefully designed (prospective) studies.
大数据在医学研究中的应用越来越普及。由于大数据项目在范围和质量上与经典研究有显著差异,因此关于大数据是否需要与统计学和科学哲学中确立的方法不同的新的科学推理方法,存在争议。
我们社会的数字化进程产生了大量的数据,这些数据也可用于医学研究。在这里,大数据的巨大承诺是促进疾病治疗、诊断和预防的重大改进。因此,对大数据的特殊性进行持续检查对于确保该领域符合循证医学的原则至关重要。我们从方法论的角度讨论了大数据在医学中的固有挑战和机遇,特别是突出了因果关系和相关性在商业和医学研究环境中的相对重要性。我们强烈支持在促进假设生成的探索性数据分析和涉及假设验证的确认性方法之间保持区别。在大数据背景下,由于缺乏标准化和结构化,数据质量往往受到阻碍,因此对研究结果进行独立验证将变得更加重要。
我们认为,在大数据时代,摒弃数据生成、分析和解释的长期确立的原则既没有必要,也很危险。虽然许多医学研究领域可能会从大数据分析中合理受益,但它们仍应辅以精心设计的(前瞻性)研究。