• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

预测撤回的研究:一个数据集和机器学习方法。

Predicting retracted research: a dataset and machine learning approaches.

作者信息

Fletcher Aaron H A, Stevenson Mark

机构信息

School of Computer Science, The University of Sheffield, Regent Court, Sheffield, S1 4DP, UK.

出版信息

Res Integr Peer Rev. 2025 Jun 11;10(1):9. doi: 10.1186/s41073-025-00168-w.

DOI:10.1186/s41073-025-00168-w
PMID:40495239
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12153192/
Abstract

BACKGROUND

Retractions undermine the scientific record's reliability and can lead to the continued propagation of flawed research. This study aimed to (1) create a dataset aggregating retraction information with bibliographic metadata, (2) train and evaluate various machine learning approaches to predict article retractions, and (3) assess each feature's contribution to feature-based classifier performance using ablation studies.

METHODS

An open-access dataset was developed by combining information from the Retraction Watch database and the OpenAlex API. Using a case-controlled design, retracted research articles were paired with non-retracted articles published in the same period. Traditional feature-based classifiers and models leveraging contextual language representations were then trained and evaluated. Model performance was assessed using accuracy, precision, recall, and the F1-score.

RESULTS

The Llama 3.2 base model achieved the highest overall accuracy. The Random Forest classifier achieved a precision of 0.687 for identifying non-retracted articles, while the Llama 3.2 base model reached a precision of 0.683 for identifying retracted articles. Traditional feature-based classifiers generally outperformed most contextual language models, except for the Llama 3.2 base model, which showed competitive performance across several metrics.

CONCLUSIONS

Although no single model excelled across all metrics, our findings indicate that machine learning techniques can effectively support the identification of retracted research. These results provide a foundation for developing automated tools to assist publishers and reviewers in detecting potentially problematic publications. Further research should focus on refining these models and investigating additional features to improve predictive performance.

TRIAL REGISTRATION

Not applicable.

摘要

背景

撤稿会破坏科学记录的可靠性,并可能导致有缺陷的研究持续传播。本研究旨在:(1)创建一个将撤稿信息与文献元数据聚合在一起的数据集;(2)训练和评估各种机器学习方法以预测文章撤稿情况;(3)使用消融研究评估每个特征对基于特征的分类器性能的贡献。

方法

通过合并来自Retraction Watch数据库和OpenAlex API的信息,开发了一个开放获取的数据集。采用病例对照设计,将撤稿的研究文章与同期发表的未撤稿文章进行配对。然后训练和评估传统的基于特征的分类器以及利用上下文语言表示的模型。使用准确率、精确率、召回率和F1分数评估模型性能。

结果

Llama 3.2基础模型总体准确率最高。随机森林分类器在识别未撤稿文章方面的精确率为0.687,而Llama 3.2基础模型在识别撤稿文章方面的精确率达到0.683。除Llama 3.2基础模型在几个指标上表现出有竞争力的性能外,传统的基于特征的分类器通常优于大多数上下文语言模型。

结论

尽管没有一个模型在所有指标上都表现出色,但我们的研究结果表明,机器学习技术可以有效地支持对撤稿研究的识别。这些结果为开发自动化工具提供了基础,以协助出版商和审稿人检测潜在有问题的出版物。进一步的研究应侧重于改进这些模型并研究其他特征以提高预测性能。

试验注册

不适用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d28/12153192/ba19ed7cb729/41073_2025_168_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d28/12153192/0efa42e62a98/41073_2025_168_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d28/12153192/550ccca60fce/41073_2025_168_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d28/12153192/3b819ed1096c/41073_2025_168_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d28/12153192/ba19ed7cb729/41073_2025_168_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d28/12153192/0efa42e62a98/41073_2025_168_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d28/12153192/550ccca60fce/41073_2025_168_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d28/12153192/3b819ed1096c/41073_2025_168_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d28/12153192/ba19ed7cb729/41073_2025_168_Fig4_HTML.jpg

相似文献

1
Predicting retracted research: a dataset and machine learning approaches.预测撤回的研究:一个数据集和机器学习方法。
Res Integr Peer Rev. 2025 Jun 11;10(1):9. doi: 10.1186/s41073-025-00168-w.
2
Research misconduct in health and life sciences research: A systematic review of retracted literature from Brazilian institutions.健康与生命科学研究中的科研不端行为:巴西机构撤回文献的系统综述。
PLoS One. 2019 Apr 15;14(4):e0214272. doi: 10.1371/journal.pone.0214272. eCollection 2019.
3
An analysis of retractions of dental publications.对牙科出版物撤回的分析。
J Dent. 2018 Dec;79:19-23. doi: 10.1016/j.jdent.2018.09.002. Epub 2018 Sep 8.
4
Retracted articles in the obstetrics literature: lessons from the past to change the future.撤回的妇产科文献:从过去吸取教训,改变未来。
Am J Obstet Gynecol MFM. 2020 Nov;2(4):100201. doi: 10.1016/j.ajogmf.2020.100201. Epub 2020 Aug 19.
5
Comprehensive analysis of retracted journal articles in the field of veterinary medicine and animal health.兽医和动物健康领域撤回文章的综合分析。
BMC Vet Res. 2022 Feb 18;18(1):73. doi: 10.1186/s12917-022-03167-x.
6
Predictive modeling and optimization in dermatology: Machine learning for skin disease classification.皮肤病学中的预测建模与优化:用于皮肤疾病分类的机器学习
Comput Biol Med. 2025 May;189:109946. doi: 10.1016/j.compbiomed.2025.109946. Epub 2025 Mar 3.
7
A survey of retracted articles in dentistry.一项关于牙科领域撤稿文章的调查。
BMC Res Notes. 2017 Jul 6;10(1):253. doi: 10.1186/s13104-017-2576-y.
8
Identifying determinants of malnutrition in under-five children in Bangladesh: insights from the BDHS-2022 cross-sectional study.确定孟加拉国五岁以下儿童营养不良的决定因素:来自2022年孟加拉国人口与健康调查横断面研究的见解
Sci Rep. 2025 Apr 24;15(1):14336. doi: 10.1038/s41598-025-99288-y.
9
Leveraging code-free deep learning for pill recognition in clinical settings: A multicenter, real-world study of performance across multiple platforms.利用无代码深度学习在临床环境中进行药丸识别:在多个平台上进行的多中心真实世界性能研究。
Artif Intell Med. 2024 Apr;150:102844. doi: 10.1016/j.artmed.2024.102844. Epub 2024 Mar 13.
10
Exploring the characteristics, global distribution and reasons for retraction of published articles involving human research participants: a literature survey.探索涉及人类研究参与者的已发表文章的撤稿特征、全球分布及原因:一项文献调查。
J Multidiscip Healthc. 2018 Jan 18;11:39-47. doi: 10.2147/JMDH.S151745. eCollection 2018.

本文引用的文献

1
Stop just paying lip service on publication integrity.别再只在出版诚信问题上做表面文章了。
Nature. 2024 Aug;632(8023):26-28. doi: 10.1038/d41586-024-02449-8.
2
Where have all the reviewers gone? reviewers 都去哪儿了?
Adv Health Sci Educ Theory Pract. 2024 Jul;29(3):717-720. doi: 10.1007/s10459-024-10350-2.
3
Research integrity and academic medicine: the pressure to publish and research misconduct.研究诚信与学术医学:发表压力与研究不端行为。
J Osteopath Med. 2024 Feb 27;124(5):187-194. doi: 10.1515/jom-2023-0211. eCollection 2024 May 1.
4
More than 10,000 research papers were retracted in 2023 - a new record.2023年有超过1万篇研究论文被撤回,创下了新纪录。
Nature. 2023 Dec;624(7992):479-481. doi: 10.1038/d41586-023-03974-8.
5
Responsibility and decision-making authority in using clinical decision support systems: an empirical-ethical exploration of German prospective professionals' preferences and concerns.使用临床决策支持系统的责任和决策权限:德国准专业人员偏好和关注的实证伦理探索。
J Med Ethics. 2023 Dec 14;50(1):6-11. doi: 10.1136/jme-2022-108814.
6
Continued use of retracted papers: Temporal trends in citations and (lack of) awareness of retractions shown in citation contexts in biomedicine.撤回论文的持续使用:生物医学领域引用语境中显示的引用时间趋势及对撤回的(缺乏)认知。
Quant Sci Stud. 2022 Feb 4;2(4):1144-1169. doi: 10.1162/qss_a_00155. eCollection 2022 Feb.
7
WHERE DOES RESPONSIBILITY LIE? ANALYSING LEGAL AND REGULATORY RESPONSES TO FLAWED CLINICAL DECISION SUPPORT SYSTEMS WHEN PATIENTS SUFFER HARM.责任在何处?当患者受到伤害时,分析对有缺陷的临床决策支持系统的法律和监管应对措施。
Med Law Rev. 2023 Feb 27;31(1):1-24. doi: 10.1093/medlaw/fwac022.
8
A qualitative and quantitative analysis of open citations to retracted articles: the Wakefield 1998 et al.'s case.对撤稿文章公开引用情况的定性和定量分析:1998年韦克菲尔德等人的案例。
Scientometrics. 2021;126(10):8433-8470. doi: 10.1007/s11192-021-04097-5. Epub 2021 Aug 5.
9
Trends and Characteristics of Retracted Articles in the Biomedical Literature, 1971 to 2020.1971 年至 2020 年生物医学文献中被撤稿文章的趋势和特征。
JAMA Intern Med. 2021 Aug 1;181(8):1118-1121. doi: 10.1001/jamainternmed.2021.1807.
10
An investigation into the impact and implications of published papers from retracted research: systematic search of affected literature.对已撤回研究论文的影响和意义的调查:受影响文献的系统检索。
BMJ Open. 2019 Oct 30;9(10):e031909. doi: 10.1136/bmjopen-2019-031909.