• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

真实数据集和虚构数据集之间的相关结构有何不同?

How does correlation structure differ between real and fabricated data-sets?

作者信息

Akhtar-Danesh Noori, Dehghan-Kooshkghazi Mahshid

机构信息

School of Nursing & Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Canada.

出版信息

BMC Med Res Methodol. 2003 Sep 29;3:18. doi: 10.1186/1471-2288-3-18.

DOI:10.1186/1471-2288-3-18
PMID:14516474
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC212490/
Abstract

BACKGROUND

Misconduct in medical research has been the subject of many papers in recent years. Among different types of misconduct, data fabrication might be considered as one of the most severe cases. There have been some arguments that correlation coefficients in fabricated data-sets are usually greater than that found in real data-sets. We aim to study the differences between real and fabricated data-sets in term of the association between two variables.

METHOD

Three examples are presented where outcomes from made up (fabricated) data-sets are compared with the results from three real data-sets and with appropriate simulated data-sets. Data-sets were made up by faculty members in three universities. The first two examples are devoted to the correlation structures between continuous variables in two different settings: first, when there is high correlation coefficient between variables, second, when the variables are not correlated. In the third example the differences between real data-set and fabricated data-sets are studied using the independent t-test for comparison between two means.

RESULTS

In general, higher correlation coefficients are seen in made up data-sets compared to the real data-sets. This occurs even when the participants are aware that the correlation coefficient for the corresponding real data-set is zero. The findings from the third example, a comparison between means in two groups, shows that many people tend to make up data with less or no differences between groups even when they know how and to what extent the groups are different.

CONCLUSION

This study indicates that high correlation coefficients can be considered as a leading sign of data fabrication; as more than 40% of the participants generated variables with correlation coefficients greater than 0.70. However, when inspecting for the differences between means in different groups, the same rule may not be applicable as we observed smaller differences between groups in made up compared to the real data-set. We also showed that inspecting the scatter-plot of two variables can be considered as a useful tool for uncovering fabricated data.

摘要

背景

近年来,医学研究中的不当行为一直是众多论文的主题。在不同类型的不当行为中,数据造假可能被视为最严重的情况之一。有一些观点认为,伪造数据集中的相关系数通常大于真实数据集中的相关系数。我们旨在研究真实数据集和伪造数据集在两个变量之间关联方面的差异。

方法

给出了三个例子,将编造(伪造)数据集的结果与三个真实数据集以及适当的模拟数据集的结果进行比较。数据集由三所大学的教员编造。前两个例子致力于研究两种不同情况下连续变量之间的相关结构:第一,变量之间存在高相关系数时;第二,变量不相关时。在第三个例子中,使用独立t检验研究真实数据集和伪造数据集之间的差异,以比较两个均值。

结果

一般来说,与真实数据集相比,伪造数据集中的相关系数更高。即使参与者知道相应真实数据集的相关系数为零,这种情况也会发生。第三个例子中两组均值比较的结果表明,许多人即使知道两组如何不同以及在何种程度上不同,仍倾向于编造组间差异较小或无差异的数据。

结论

本研究表明,高相关系数可被视为数据造假的一个主要迹象;因为超过40%的参与者生成的变量相关系数大于0.70。然而,在检查不同组均值之间的差异时,同样的规则可能不适用,因为我们观察到与真实数据集相比,编造数据集中组间差异更小。我们还表明,检查两个变量的散点图可被视为发现伪造数据的有用工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c87/212490/e917ecb33f1e/1471-2288-3-18-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c87/212490/6dfbc0a3ba0f/1471-2288-3-18-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c87/212490/6e37d5a175da/1471-2288-3-18-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c87/212490/4967a72e3ab2/1471-2288-3-18-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c87/212490/e917ecb33f1e/1471-2288-3-18-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c87/212490/6dfbc0a3ba0f/1471-2288-3-18-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c87/212490/6e37d5a175da/1471-2288-3-18-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c87/212490/4967a72e3ab2/1471-2288-3-18-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c87/212490/e917ecb33f1e/1471-2288-3-18-4.jpg

相似文献

1
How does correlation structure differ between real and fabricated data-sets?真实数据集和虚构数据集之间的相关结构有何不同?
BMC Med Res Methodol. 2003 Sep 29;3:18. doi: 10.1186/1471-2288-3-18.
2
Correlated measurement error--implications for nutritional epidemiology.相关测量误差——对营养流行病学的影响
Int J Epidemiol. 2004 Dec;33(6):1373-81. doi: 10.1093/ije/dyh138. Epub 2004 Aug 27.
3
Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data Part 2. Variable reduction.数据集之间的典型相关性度量(CMC)和典型距离度量(CMD) 第2部分。变量约简。
Anal Chim Acta. 2009 Aug 19;648(1):52-9. doi: 10.1016/j.aca.2009.06.035. Epub 2009 Jun 21.
4
Effects of long-term exposure to traffic-related air pollution on respiratory and cardiovascular mortality in the Netherlands: the NLCS-AIR study.长期暴露于交通相关空气污染对荷兰呼吸道和心血管疾病死亡率的影响:荷兰长期队列空气污染研究(NLCS-AIR研究)
Res Rep Health Eff Inst. 2009 Mar(139):5-71; discussion 73-89.
5
[Interest of a new instrument to assess cognition in schizophrenia: The Brief Assessment of Cognition in Schizophrenia (BACS)].[一种用于评估精神分裂症认知功能的新工具的价值:精神分裂症认知功能简短评估量表(BACS)]
Encephale. 2008 Dec;34(6):557-62. doi: 10.1016/j.encep.2007.12.005. Epub 2008 Jul 9.
6
Extended follow-up and spatial analysis of the American Cancer Society study linking particulate air pollution and mortality.美国癌症协会关于空气污染颗粒与死亡率关系研究的长期随访及空间分析
Res Rep Health Eff Inst. 2009 May(140):5-114; discussion 115-36.
7
[Tuberculosis in compromised hosts].[免疫功能低下宿主中的结核病]
Kekkaku. 2003 Nov;78(11):717-22.
8
[Evaluation of perturbed body image in eating disorders using the Body Shape Questionnaire].[使用身体形状问卷评估饮食失调中身体意象的紊乱]
Encephale. 2008 Dec;34(6):570-6. doi: 10.1016/j.encep.2007.11.005. Epub 2008 Apr 2.
9
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
10
Pulse wave velocity and coronary risk stratification.脉搏波速度与冠状动脉风险分层。
Rev Port Cardiol. 2009 Feb;28(2):155-71.

引用本文的文献

1
Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology.数据侦探的工具:检测心理学数据及结果异常的统计方法综述
Theory Psychol. 2025 Jun;35(3):359-380. doi: 10.1177/09593543241311861. Epub 2025 Feb 1.
2
Fraud in clinical trials: complex problem, simple solutions?临床试验中的欺诈行为:是复杂问题,还是有简单的解决办法?
Int J Clin Oncol. 2016 Feb;21(1):13-4. doi: 10.1007/s10147-015-0922-4. Epub 2015 Nov 14.

本文引用的文献

1
A comparison of problem-based and conventional curricula in nursing education.护理教育中基于问题的课程与传统课程的比较。
Adv Health Sci Educ Theory Pract. 2002;7(1):3-17. doi: 10.1023/a:1014534712178.
2
The clinical trial: deceitful, disputable, unbelievable, unhelpful, and shameful--what next?该临床试验:充满欺骗、存在争议、难以置信、毫无帮助且令人羞愧——接下来会怎样?
Control Clin Trials. 2001 Dec;22(6):593-604. doi: 10.1016/s0197-2456(01)00175-1.
3
Scientific dishonestry: European reflections.科学不诚实行为:欧洲的思考
J Clin Pathol. 2001 Jan;54(1):4-6. doi: 10.1136/jcp.54.1.4.
4
Plans for tackling research fraud may not go far enough.打击科研欺诈的计划可能还不够深入。
BMJ. 2000 Dec 16;321(7275):1487.
5
Fraud in medical research: an international survey of biostatisticians. ISCB Subcommittee on Fraud.医学研究中的欺诈行为:生物统计学家的国际调查。国际统计计算学会欺诈问题小组委员会
Control Clin Trials. 2000 Oct;21(5):415-27. doi: 10.1016/s0197-2456(00)00069-6.
6
The role of biostatistics in the prevention, detection and treatment of fraud in clinical trials.生物统计学在临床试验中欺诈行为的预防、检测和处理中的作用。
Stat Med. 1999 Dec 30;18(24):3435-51. doi: 10.1002/(sici)1097-0258(19991230)18:24<3435::aid-sim365>3.0.co;2-o.
7
Distinctions between fraud, bias, errors, misunderstanding, and incompetence.欺诈、偏见、错误、误解和无能之间的区别。
Control Clin Trials. 1997 Dec;18(6):637-50; discussion 661-6. doi: 10.1016/s0197-2456(97)00010-x.
8
Detecting fabrication of data in a multicenter collaborative animal study.在多中心协作动物研究中检测数据造假。
Control Clin Trials. 1991 Dec;12(6):741-52. doi: 10.1016/0197-2456(91)90037-m.
9
A case of data alteration in the Multiple Risk Factor Intervention Trial (MRFIT). The MRFIT Research Group.多重危险因素干预试验(MRFIT)中的一例数据篡改事件。MRFIT研究小组。
Control Clin Trials. 1991 Dec;12(6):731-40. doi: 10.1016/0197-2456(91)90036-l.