深度学习识别文献中基于微阵列的、基因水平的错误结论。

Deep learning identifies erroneous microarray-based, gene-level conclusions in literature.

作者信息

Qin Yanan, Yi Daiyao, Chen Xianghao, Guan Yuanfang

机构信息

Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.

出版信息

NAR Genom Bioinform. 2021 Oct 4;3(4):lqab089. doi: 10.1093/nargab/lqab089. eCollection 2021 Dec.

DOI:10.1093/nargab/lqab089

PMID:34617014

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8489595/

Abstract

More than 110 000 publications have used microarrays to decipher phenotype-associated genes, clinical biomarkers and gene functions. Microarrays rely on digital assaying the fluorescence signals of arrays. In this study, we retrospectively constructed raw images for 37 724 published microarray data, and developed deep learning algorithms to automatically detect systematic defects. We report that an alarming amount of 26.73% of the microarray-based studies are affected by serious imaging defects. By literature mining, we found that publications associated with these affected microarrays have reported disproportionately more biological discoveries on the genes in the contaminated areas compared to other genes. 28.82% of the gene-level conclusions reported in these publications were based on measurements falling into the contaminated area, indicating severe, systematic problems caused by such contaminations. We provided the identified published, problematic datasets, affected genes and the imputed arrays as well as software tools for scanning such contamination that will become essential to future studies to scrutinize and critically analyze microarray data.

摘要

超过11万篇出版物使用微阵列来解读与表型相关的基因、临床生物标志物和基因功能。微阵列依靠对阵列的荧光信号进行数字检测。在本研究中，我们回顾性地为37724篇已发表的微阵列数据构建了原始图像，并开发了深度学习算法来自动检测系统缺陷。我们报告称，高达26.73%的基于微阵列的研究受到严重成像缺陷的影响。通过文献挖掘，我们发现，与这些受影响的微阵列相关的出版物报告的受污染区域基因的生物学发现比其他基因多得多。这些出版物中报告的28.82%的基因水平结论是基于落入受污染区域的测量数据，表明此类污染导致了严重的系统性问题。我们提供了已识别的有问题的已发表数据集、受影响的基因、插补阵列以及用于扫描此类污染的软件工具，这些对于未来研究仔细审查和批判性分析微阵列数据至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5c3/8489595/3bc68ba0ff64/lqab089fig1.jpg

相似文献

Deep learning identifies erroneous microarray-based, gene-level conclusions in literature.深度学习识别文献中基于微阵列的、基因水平的错误结论。

NAR Genom Bioinform. 2021 Oct 4;3(4):lqab089. doi: 10.1093/nargab/lqab089. eCollection 2021 Dec.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

AffyMiner: mining differentially expressed genes and biological knowledge in GeneChip microarray data.AffyMiner：挖掘基因芯片微阵列数据中的差异表达基因和生物学知识。

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S26. doi: 10.1186/1471-2105-7-S4-S26.

Computer Tools to Analyze Microarray Data.用于分析微阵列数据的计算机工具。

Methods Mol Biol. 2019;1986:267-282. doi: 10.1007/978-1-4939-9442-7_13.

Micro-Analyzer: automatic preprocessing of Affymetrix microarray data.微分析器：Affymetrix 微阵列数据的自动预处理。

Comput Methods Programs Biomed. 2013 Aug;111(2):402-9. doi: 10.1016/j.cmpb.2013.04.006. Epub 2013 May 31.

MGDB: crossing the marker genes of a user microarray with a database of public-microarrays marker genes.MGDB：将用户微阵列的标记基因与公共微阵列标记基因数据库进行交叉。

Bioinformatics. 2014 Jun 15;30(12):1780-1. doi: 10.1093/bioinformatics/btu109. Epub 2014 Feb 25.

Microarray data analysis and mining approaches.微阵列数据分析与挖掘方法。

Brief Funct Genomic Proteomic. 2007 Dec;6(4):265-81. doi: 10.1093/bfgp/elm034. Epub 2008 Jan 22.

PALMER: improving pathway annotation based on the biomedical literature mining with a constrained latent block model.帕尔默：基于受限潜在块模型的生物医学文献挖掘来改进途径注释。

BMC Bioinformatics. 2020 Oct 2;21(1):432. doi: 10.1186/s12859-020-03756-3.

SplicerAV: a tool for mining microarray expression data for changes in RNA processing.剪接体分析工具（SplicerAV）：一种挖掘微阵列表达数据中 RNA 处理变化的工具。

BMC Bioinformatics. 2010 Feb 25;11:108. doi: 10.1186/1471-2105-11-108.

GCSscore: an R package for differential gene expression analysis in Affymetrix/Thermo-Fisher whole transcriptome microarrays.GCS评分：一个用于Affymetrix/赛默飞世尔全转录组微阵列差异基因表达分析的R包。

BMC Genomics. 2021 Feb 1;22(1):96. doi: 10.1186/s12864-021-07370-2.

引用本文的文献

ADVANCING THE UNDERSTANDING OF CLINICAL SEPSIS USING GENE EXPRESSION-DRIVEN MACHINE LEARNING TO IMPROVE PATIENT OUTCOMES.利用基于基因表达的机器学习来改善患者预后，从而深入了解临床败血症。

Shock. 2024 Jan 1;61(1):4-18. doi: 10.1097/SHK.0000000000002227. Epub 2023 Sep 22.

本文引用的文献

Simulating ComBat: how batch correction can lead to the systematic introduction of false positive results in DNA methylation microarray studies.模拟 ComBat：批次校正如何导致 DNA 甲基化微阵列研究中系统地引入假阳性结果。

BMC Bioinformatics. 2020 Jun 30;21(1):271. doi: 10.1186/s12859-020-03559-6.

The sensitivity of transcriptomics BMD modeling to the methods used for microarray data normalization.转录组学 BMD 建模对微阵列数据标准化方法的敏感性。

PLoS One. 2020 May 15;15(5):e0232955. doi: 10.1371/journal.pone.0232955. eCollection 2020.

Fully Convolutional Networks for Semantic Segmentation.全卷积网络用于语义分割。

IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):640-651. doi: 10.1109/TPAMI.2016.2572683. Epub 2016 May 24.

Recurrent epimutation of SDHC in gastrointestinal stromal tumors.胃肠道间质瘤中SDHC的复发性表观突变

Sci Transl Med. 2014 Dec 24;6(268):268ra177. doi: 10.1126/scitranslmed.3009961.

Microarray analysis reveals novel features of the muscle aging process in men and women.基因芯片分析揭示了男女肌肉衰老过程的新特征。

J Gerontol A Biol Sci Med Sci. 2013 Sep;68(9):1035-44. doi: 10.1093/gerona/glt015. Epub 2013 Feb 15.

Batch effect removal methods for microarray gene expression data integration: a survey.批量效应去除方法在微阵列基因表达数据整合中的应用：综述。

Brief Bioinform. 2013 Jul;14(4):469-90. doi: 10.1093/bib/bbs037. Epub 2012 Jul 31.

Batch effect correction for genome-wide methylation data with Illumina Infinium platform.基于 Illumina Infinium 平台的全基因组甲基化数据的批次效应校正。

BMC Med Genomics. 2011 Dec 16;4:84. doi: 10.1186/1755-8794-4-84.

A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data.使用 MAQC-II 微阵列基因表达数据比较批次效应消除方法以增强预测性能。

Pharmacogenomics J. 2010 Aug;10(4):278-91. doi: 10.1038/tpj.2010.57.

Microarray background correction: maximum likelihood estimation for the normal-exponential convolution.微阵列背景校正：正态-指数卷积的最大似然估计

Biostatistics. 2009 Apr;10(2):352-63. doi: 10.1093/biostatistics/kxn042. Epub 2008 Dec 8.

A comparison of background correction methods for two-colour microarrays.双色微阵列背景校正方法的比较

Bioinformatics. 2007 Oct 15;23(20):2700-7. doi: 10.1093/bioinformatics/btm412. Epub 2007 Aug 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

深度学习识别文献中基于微阵列的、基因水平的错误结论。

Deep learning identifies erroneous microarray-based, gene-level conclusions in literature.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献