Suppr超能文献

检测大规模分子组学数据中的伪造。

Detecting fabrication in large-scale molecular omics data.

机构信息

Computer Science Department, University of Colorado Boulder, Boulder, Colorado, United States of America.

Biology Department, Brigham Young University, Provo, Utah, United States of America.

出版信息

PLoS One. 2021 Nov 30;16(11):e0260395. doi: 10.1371/journal.pone.0260395. eCollection 2021.

Abstract

Fraud is a pervasive problem and can occur as fabrication, falsification, plagiarism, or theft. The scientific community is not exempt from this universal problem and several studies have recently been caught manipulating or fabricating data. Current measures to prevent and deter scientific misconduct come in the form of the peer-review process and on-site clinical trial auditors. As recent advances in high-throughput omics technologies have moved biology into the realm of big-data, fraud detection methods must be updated for sophisticated computational fraud. In the financial sector, machine learning and digit-frequencies are successfully used to detect fraud. Drawing from these sources, we develop methods of fabrication detection in biomedical research and show that machine learning can be used to detect fraud in large-scale omic experiments. Using the gene copy-number data as input, machine learning models correctly predicted fraud with 58-100% accuracy. With digit frequency as input features, the models detected fraud with 82%-100% accuracy. All of the data and analysis scripts used in this project are available at https://github.com/MSBradshaw/FakeData.

摘要

欺诈是一个普遍存在的问题,可能表现为捏造、伪造、抄袭或偷窃。科学界也不能免除这个普遍存在的问题,最近有几项研究被发现操纵或捏造数据。目前,防止和遏制科学不端行为的措施包括同行评审过程和现场临床试验审核员。由于高通量组学技术的最新进展将生物学带入了大数据领域,因此必须更新欺诈检测方法以应对复杂的计算欺诈。在金融部门,机器学习和数字频率被成功用于检测欺诈。借鉴这些来源,我们开发了生物医学研究中伪造检测的方法,并表明机器学习可用于检测大规模组学实验中的欺诈行为。使用基因拷贝数数据作为输入,机器学习模型以 58%-100%的准确率正确预测了欺诈行为。使用数字频率作为输入特征,模型以 82%-100%的准确率检测到了欺诈行为。本项目中使用的所有数据和分析脚本都可以在 https://github.com/MSBradshaw/FakeData 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee01/8631639/a5db94ac8476/pone.0260395.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验