Suppr超能文献

重大数据分析错误使癌症微生物组研究结果无效。

Major data analysis errors invalidate cancer microbiome findings.

作者信息

Gihawi Abraham, Ge Yuchen, Lu Jennifer, Puiu Daniela, Xu Amanda, Cooper Colin S, Brewer Daniel S, Pertea Mihaela, Salzberg Steven L

机构信息

Norwich Medical School, University of East Anglia, Norwich, UK.

Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, USA.

出版信息

bioRxiv. 2023 Jul 31:2023.07.28.550993. doi: 10.1101/2023.07.28.550993.

Abstract

We re-analyzed the data from a recent large-scale study that reported strong correlations between microbial organisms and 33 different cancer types, and that created machine learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the reported data and in the methods: (1) errors in the genome database and the associated computational methods led to millions of false positive findings of bacterial reads across all samples, largely because most of the sequences identified as bacteria were instead human; and (2) errors in transformation of the raw data created an artificial signature, even for microbes with no reads detected, tagging each tumor type with a distinct signal that the machine learning programs then used to create an apparently accurate classifier. Each of these problems invalidates the results, leading to the conclusion that the microbiome-based classifiers for identifying cancer presented in the study are entirely wrong. These flaws have subsequently affected more than a dozen additional published studies that used the same data and whose results are likely invalid as well.

摘要

我们重新分析了近期一项大规模研究的数据。该研究报告了微生物与33种不同癌症类型之间存在强相关性,并创建了在区分癌症方面具有近乎完美准确率的机器学习预测模型。我们在报告的数据和方法中发现了至少两个根本性缺陷:(1)基因组数据库及相关计算方法中的错误导致在所有样本中出现数百万个细菌读数的假阳性结果,主要是因为大多数被鉴定为细菌的序列实际上是人类序列;(2)原始数据转换中的错误产生了一种人为特征,即使对于未检测到读数的微生物也是如此,为每种肿瘤类型标记了一种独特信号,机器学习程序随后利用该信号创建了一个看似准确的分类器。这些问题中的每一个都使结果无效,从而得出结论:该研究中提出的用于识别癌症的基于微生物组的分类器完全错误。这些缺陷随后影响了另外十几项已发表的研究,这些研究使用了相同的数据,其结果可能同样无效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0546/10418105/7b5f64363d51/nihpp-2023.07.28.550993v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验