Suppr超能文献

遗漏患者或在可重复性差的测量数据中进行选择,会轻易造成人为关联吗?在心脏病学中,用于检测的方法及其对观察性研究设计的影响。

How easily can omission of patients, or selection amongst poorly-reproducible measurements, create artificial correlations? Methods for detection and implications for observational research design in cardiology.

机构信息

International Centre for Circulatory Health, National Heart and Lung Institute, Imperial College London, 59-61 North Wharf Road, London W2 1LA, UK.

出版信息

Int J Cardiol. 2013 Jul 15;167(1):102-13. doi: 10.1016/j.ijcard.2011.12.018. Epub 2012 Jan 27.

Abstract

BACKGROUND

When reported correlation coefficients seem too high to be true, does investigative verification of source data provide suitable reassurance? This study tests how easily omission of patients or selection amongst irreproducible measurements generate fictitious strong correlations, without data fabrication.

METHOD AND RESULTS

Two forms of manipulation are applied to a pair of normally-distributed, uncorrelated variables: first, exclusion of patients least favourable to a hypothesised association and, second, making multiple poorly-reproducible measurements per patient and choosing the most supportive. Excluding patients raises correlations powerfully, from 0.0 ± 0.11 (no patients omitted) to 0.40 ± 0.11 (one-fifth omitted), 0.59 ± 0.08 (one-third omitted) and 0.78 ± 0.05 (half omitted). Study size offers no protection: omitting just one-fifth of 75 patients (i.e. publishing 60) makes 92% of correlations statistically significant. Worse, simply selecting the most favourable amongst several measurements raises correlations from 0.0 ± 0.12 (single measurement of each variable) to 0.73 ± 0.06 (best of 2), and 0.90 ± 0.03 (best of 4). 100% of correlation coefficients become statistically significant. Scatterplots may reveal a telltale "shave sign" or "bite sign". Simple statistical tests are presented for these suspicious signatures in single or multiple studies.

CONCLUSION

Correlations are vulnerable to data manipulation. Cardiology is especially vulnerable to patient deletion (because cardiologists ourselves might completely control enrolment and measurement), and selection of "best" measurements (because alternative heartbeats are numerous, and some modalities poorly reproducible). Source data verification cannot detect these but tests might highlight suspicious data and--aggregating across studies--unreliable laboratories or research fields. Cardiological correlation research needs adequately-informed planning and guarantees of integrity, with teeth.

摘要

背景

当报告的相关系数高得难以置信时,对源数据进行调查核实是否能提供充分的保证?本研究测试了在不伪造数据的情况下,简单地删除患者或对不可重现的测量值进行选择,是否容易产生虚假的强相关性。

方法和结果

对一对正态分布、不相关的变量进行两种形式的操作:首先,排除对假设关联最不利的患者;其次,对每个患者进行多次不可重现的测量,并选择最支持的测量值。排除患者会强烈提高相关性,从 0.0 ± 0.11(未排除患者)提高到 0.40 ± 0.11(排除五分之一患者)、0.59 ± 0.08(排除三分之一患者)和 0.78 ± 0.05(排除一半患者)。研究规模并不能提供保护:仅排除 75 名患者中的五分之一(即发表 60 名患者),就使 92%的相关性具有统计学意义。更糟糕的是,仅仅选择多个测量值中最有利的一个,就可以将相关性从 0.0 ± 0.12(每个变量的单个测量值)提高到 0.73 ± 0.06(最好的 2 个)和 0.90 ± 0.03(最好的 4 个)。100%的相关系数都具有统计学意义。散点图可能会显示出一个明显的“剃刀信号”或“咬痕信号”。本文介绍了用于单个或多个研究的这些可疑特征的简单统计检验。

结论

相关性容易受到数据操作的影响。心脏病学特别容易受到患者删除的影响(因为心脏病学家自己可能完全控制招募和测量),以及“最佳”测量值的选择(因为替代心跳很多,有些模式的可重复性很差)。源数据验证无法检测到这些问题,但这些测试可能会突出可疑数据,并通过汇总多个研究,揭示不可靠的实验室或研究领域。心血管相关性研究需要有充分了解的规划和可靠的完整性保证,并具有实际执行力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验