Suppr超能文献

算法识别发表的比值与其报告的置信区间和 P 值之间的差异。

Algorithmic identification of discrepancies between published ratios and their reported confidence intervals and P-values.

机构信息

Arthritis and Clinical Immunology Research Program, Division of Genomics and Data Sciences, Oklahoma Medical Research Foundation, Oklahoma City, Oklahoma 73104-5005.

Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center.

出版信息

Bioinformatics. 2018 May 15;34(10):1758-1766. doi: 10.1093/bioinformatics/btx811.

Abstract

MOTIVATION

Studies, mostly from the operations/management literature, have shown that the rate of human error increases with task complexity. What is not known is how many errors make it into the published literature, given that they must slip by peer-review. By identifying paired, dependent values within text for reported calculations of varying complexity, we can identify discrepancies, quantify error rates and identify mitigating factors.

RESULTS

We extracted statistical ratios from MEDLINE abstracts (hazard ratio, odds ratio, relative risk), their 95% CIs, and their P-values. We re-calculated the ratios and P-values using the reported CIs. For comparison, we also extracted percent-ratio pairs, one of the simplest calculation tasks. Over 486 000 published values were found and analyzed for discrepancies, allowing for rounding and significant figures. Per reported item, discrepancies were less frequent in percent-ratio calculations (2.7%) than in ratio-CI and P-value calculations (5.6-7.5%), and smaller discrepancies were more frequent than large ones. Systematic discrepancies (multiple incorrect calculations of the same type) were higher for more complex tasks (14.3%) than simple ones (6.7%). Discrepancy rates decreased with increasing journal impact factor (JIF) and increasing number of authors, but with diminishing returns and JIF accounting for most of the effect. Approximately 87% of the 81 937 extracted P-values were ≤ 0.05.

CONCLUSION

Using a simple, yet accurate, approach to identifying paired values within text, we offer the first quantitative evaluation of published error frequencies within these types of calculations.

CONTACT

jonathan-wren@omrf.org or jdwren@gmail.com.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

研究主要来自运营/管理文献,表明随着任务复杂性的增加,人为错误的发生率也会增加。目前尚不清楚有多少错误会出现在发表的文献中,因为它们必须通过同行评审。通过在报告的计算中识别文本内配对的、相关的值,可以确定差异、量化错误率并确定减轻因素。

结果

我们从 MEDLINE 摘要中提取了统计比率(危险比、优势比、相对风险)、它们的 95%置信区间(CI)和 P 值。我们使用报告的 CI 重新计算了比率和 P 值。为了比较,我们还提取了最简单计算任务之一的百分比比率对。发现并分析了超过 486000 个已发表的数值差异,允许舍入和有效数字。每个报告的项目中,百分比比率计算的差异(2.7%)比比率-CI 和 P 值计算的差异(5.6-7.5%)更不频繁,小差异比大差异更频繁。对于更复杂的任务(14.3%),系统差异(相同类型的多个不正确计算)比简单任务(6.7%)更高。差异率随着期刊影响因子(JIF)和作者数量的增加而降低,但回报递减,JIF 占大部分影响。在提取的 81937 个 P 值中,约有 87%的值≤0.05。

结论

使用一种简单但准确的方法在文本内识别配对值,我们首次对这些类型的计算中发表的错误频率进行了定量评估。

联系方式

jonathan-wren@omrf.orgjdwren@gmail.com

补充信息

补充数据可在生物信息学在线获得。

相似文献

引用本文的文献

3
Ten simple rules on writing clean and reliable open-source scientific software.关于编写干净可靠的开源科学软件的十则简单规则。
PLoS Comput Biol. 2021 Nov 11;17(11):e1009481. doi: 10.1371/journal.pcbi.1009481. eCollection 2021 Nov.

本文引用的文献

5
Trends in the production of scientific data analysis resources.科学数据分析资源的生产趋势。
BMC Bioinformatics. 2014;15 Suppl 11(Suppl 11):S7. doi: 10.1186/1471-2105-15-S11-S7. Epub 2014 Oct 21.
6
Policy: NIH plans to enhance reproducibility.政策:NIH 计划提高可重复性。
Nature. 2014 Jan 30;505(7485):612-3. doi: 10.1038/505612a.
8
Retracted science and the retraction index.撤稿科学与撤稿指数。
Infect Immun. 2011 Oct;79(10):3855-9. doi: 10.1128/IAI.05661-11. Epub 2011 Aug 8.
9
Accuracy of references in five biomedical informatics journals.五本生物医学信息学期刊中参考文献的准确性。
J Am Med Inform Assoc. 2005 Mar-Apr;12(2):225-8. doi: 10.1197/jamia.M1683. Epub 2004 Nov 23.
10

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验