Suppr超能文献

利用人和人工智能估计科学发现的深度可重复性。

Estimating the deep replicability of scientific findings using human and artificial intelligence.

机构信息

Northwestern University Institute on Complex Systems, Evanston, IL 60208.

Kellogg School of Management, Northwestern University, Evanston, IL 60208.

出版信息

Proc Natl Acad Sci U S A. 2020 May 19;117(20):10762-10768. doi: 10.1073/pnas.1909046117. Epub 2020 May 4.

Abstract

Replicability tests of scientific papers show that the majority of papers fail replication. Moreover, failed papers circulate through the literature as quickly as replicating papers. This dynamic weakens the literature, raises research costs, and demonstrates the need for new approaches for estimating a study's replicability. Here, we trained an artificial intelligence model to estimate a paper's replicability using ground truth data on studies that had passed or failed manual replication tests, and then tested the model's generalizability on an extensive set of out-of-sample studies. The model predicts replicability better than the base rate of reviewers and comparably as well as prediction markets, the best present-day method for predicting replicability. In out-of-sample tests on manually replicated papers from diverse disciplines and methods, the model had strong accuracy levels of 0.65 to 0.78. Exploring the reasons behind the model's predictions, we found no evidence for bias based on topics, journals, disciplines, base rates of failure, persuasion words, or novelty words like "remarkable" or "unexpected." We did find that the model's accuracy is higher when trained on a paper's text rather than its reported statistics and that n-grams, higher order word combinations that humans have difficulty processing, correlate with replication. We discuss how combining human and machine intelligence can raise confidence in research, provide research self-assessment techniques, and create methods that are scalable and efficient enough to review the ever-growing numbers of publications-a task that entails extensive human resources to accomplish with prediction markets and manual replication alone.

摘要

科学论文的可重复性测试表明,大多数论文都无法复制。此外,失败的论文像复制论文一样迅速在文献中传播。这种动态削弱了文献,增加了研究成本,并表明需要新的方法来估计研究的可重复性。在这里,我们使用已经通过或未通过手动复制测试的研究的真实数据来训练人工智能模型,以估计论文的可重复性,然后在广泛的样本外研究中测试模型的泛化能力。该模型的预测可重复性优于审稿人的基本比率,并且与预测市场一样好,预测市场是目前预测可重复性的最佳方法。在对来自不同学科和方法的手动复制论文进行的样本外测试中,该模型具有很强的准确性,范围在 0.65 到 0.78 之间。在探索模型预测背后的原因时,我们没有发现基于主题、期刊、学科、失败的基本比率、说服性词汇或“显著”或“意外”等新颖性词汇的偏见证据。我们确实发现,当基于论文的文本而不是报告的统计数据对模型进行训练时,模型的准确性更高,并且 n-gram(人类难以处理的更高阶的单词组合)与复制相关。我们讨论了如何结合人类和机器智能来提高研究的可信度,提供研究自我评估技术,并创建足够可扩展且高效的方法来审查不断增长的出版物数量——仅使用预测市场和手动复制就需要大量的人力资源来完成。

相似文献

10

引用本文的文献

9
Artificial intelligence and illusions of understanding in scientific research.人工智能与科研中的理解错觉。
Nature. 2024 Mar;627(8002):49-58. doi: 10.1038/s41586-024-07146-0. Epub 2024 Mar 6.
10
Science communication with generative AI.利用生成式人工智能进行科学传播。
Nat Hum Behav. 2024 Apr;8(4):625-627. doi: 10.1038/s41562-024-01846-3.

本文引用的文献

2
Predicting the replicability of social science lab experiments.预测社会科学实验室实验的可重复性。
PLoS One. 2019 Dec 5;14(12):e0225826. doi: 10.1371/journal.pone.0225826. eCollection 2019.
4
Reviewer bias in single- versus double-blind peer review.单盲与双盲同行评议中的评审偏倚。
Proc Natl Acad Sci U S A. 2017 Nov 28;114(48):12708-12713. doi: 10.1073/pnas.1707323114. Epub 2017 Nov 14.
8
Contextual sensitivity in scientific reproducibility.科学可重复性中的情境敏感性。
Proc Natl Acad Sci U S A. 2016 Jun 7;113(23):6454-9. doi: 10.1073/pnas.1521897113. Epub 2016 May 23.
9
Evaluating replicability of laboratory experiments in economics.评估经济学实验室实验的可重复性。
Science. 2016 Mar 25;351(6280):1433-6. doi: 10.1126/science.aaf0918. Epub 2016 Mar 3.
10
Using prediction markets to estimate the reproducibility of scientific research.利用预测市场评估科研的可重复性。
Proc Natl Acad Sci U S A. 2015 Dec 15;112(50):15343-7. doi: 10.1073/pnas.1516179112. Epub 2015 Nov 9.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验