• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因名称错误:未吸取教训。

Gene name errors: Lessons not learned.

机构信息

Deakin University, School of Life and Environmental Sciences, Geelong, Australia.

出版信息

PLoS Comput Biol. 2021 Jul 30;17(7):e1008984. doi: 10.1371/journal.pcbi.1008984. eCollection 2021 Jul.

DOI:10.1371/journal.pcbi.1008984
PMID:34329294
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8357140/
Abstract

Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a scan of supplementary files published in PubMed Central from 2014 to 2020. Overall, gene name errors continued to accumulate unabated in the period after 2016. An improved scanning software we developed identified gene name errors in 30.9% (3,436/11,117) of articles with supplementary Excel gene lists; a figure significantly higher than previously estimated. This is due to gene names being converted not just to dates and floating-point numbers, but also to internal date format (five-digit numbers). These findings further reinforce that spreadsheets are ill-suited to use with large genomic data.

摘要

多年来,将基因名称错误转换为其他日期和其他数据类型一直令计算生物学家感到沮丧。我们假设,在 2016 年的一份报告强调了这个问题的严重程度之后,补充文件中的此类错误可能会减少。为了评估这一点,我们对 2014 年至 2020 年在 PubMed Central 发表的补充文件进行了扫描。总体而言,2016 年后,基因名称错误仍在持续不断地累积。我们开发的一种改进的扫描软件在带有补充 Excel 基因列表的文章中识别出 30.9%(3,436/11,117)的文章存在基因名称错误;这一数字明显高于之前的估计。这是因为基因名称不仅被转换为日期和浮点数,还被转换为内部日期格式(五位数)。这些发现进一步证实,电子表格不适合与大型基因组数据一起使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d7c/8357140/2bd60b682194/pcbi.1008984.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d7c/8357140/c281a34b6767/pcbi.1008984.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d7c/8357140/3c10fdcc9af7/pcbi.1008984.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d7c/8357140/2bd60b682194/pcbi.1008984.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d7c/8357140/c281a34b6767/pcbi.1008984.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d7c/8357140/3c10fdcc9af7/pcbi.1008984.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d7c/8357140/2bd60b682194/pcbi.1008984.g003.jpg

相似文献

1
Gene name errors: Lessons not learned.基因名称错误:未吸取教训。
PLoS Comput Biol. 2021 Jul 30;17(7):e1008984. doi: 10.1371/journal.pcbi.1008984. eCollection 2021 Jul.
2
Gene name errors are widespread in the scientific literature.基因名称错误在科学文献中广泛存在。
Genome Biol. 2016 Aug 23;17(1):177. doi: 10.1186/s13059-016-1044-7.
3
Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics.错误的标识符:在生物信息学中使用Excel时,基因名称错误可能会无意中被引入。
BMC Bioinformatics. 2004 Jun 23;5:80. doi: 10.1186/1471-2105-5-80.
4
The importance of being Earnest: I can't glean which gene you mean.
Physiol Genomics. 2011 Feb 24;43(4):187. doi: 10.1152/physiolgenomics.00252.2010. Epub 2010 Dec 21.
5
Gene Updater: a web tool that autocorrects and updates for Excel misidentified gene names.基因更新器:一个自动纠正和更新 Excel 错误识别基因名称的网络工具。
Sci Rep. 2022 Jul 26;12(1):12743. doi: 10.1038/s41598-022-17104-3.
6
Truke, a web tool to check for and handle excel misidentified gene symbols.Truke,一个用于检查和处理Excel中错误识别的基因符号的网络工具。
BMC Genomics. 2017 Mar 21;18(1):242. doi: 10.1186/s12864-017-3631-8.
7
Biosom: gene synonym analysis by self-organizing map.Biosom:通过自组织映射进行基因同义词分析。
Genet Mol Res. 2015 Feb 20;14(1):1461-8. doi: 10.4238/2015.February.20.1.
8
PSE: a tool for browsing a large amount of MEDLINE/PubMed abstracts with gene names and common words as the keywords.PSE:一种以基因名称和常用词作为关键词来浏览大量MEDLINE/PubMed摘要的工具。
BMC Bioinformatics. 2005 Dec 10;6:295. doi: 10.1186/1471-2105-6-295.
9
Escape Excel: A tool for preventing gene symbol and accession conversion errors.逃离Excel:一种防止基因符号和登录号转换错误的工具。
PLoS One. 2017 Sep 27;12(9):e0185207. doi: 10.1371/journal.pone.0185207. eCollection 2017.
10
Interspecies Gene Name Extrapolation--A New Approach.种间基因名称推断——一种新方法。
PLoS One. 2015 Sep 25;10(9):e0138751. doi: 10.1371/journal.pone.0138751. eCollection 2015.

引用本文的文献

1
Catalyzing computational biology research at an academic institute through an interest network.通过兴趣网络推动学术机构的计算生物学研究。
PLoS Comput Biol. 2025 Sep 10;21(9):e1013453. doi: 10.1371/journal.pcbi.1013453. eCollection 2025 Sep.
2
Datavzrd: Rapid programming- and maintenance-free interactive visualization and communication of tabular data.Datavzrd:无需编程和维护即可快速实现表格数据的交互式可视化与通信。
PLoS One. 2025 Jul 22;20(7):e0323079. doi: 10.1371/journal.pone.0323079. eCollection 2025.
3
From Spreadsheets and Bespoke Models to Enterprise Data Warehouses: GPT-enabled Clinical Data Ingestion into i2b2.

本文引用的文献

1
HGNChelper: identification and correction of invalid gene symbols for human and mouse.HGNC助手:人类和小鼠无效基因符号的识别与校正
F1000Res. 2020 Dec 21;9:1493. doi: 10.12688/f1000research.28033.2. eCollection 2020.
2
Guidelines for human gene nomenclature.人类基因命名准则。
Nat Genet. 2020 Aug;52(8):754-758. doi: 10.1038/s41588-020-0669-3.
3
Prestigious Science Journals Struggle to Reach Even Average Reliability.著名科学期刊甚至难以达到平均可靠性。
从电子表格和定制模型到企业数据仓库:将由GPT驱动的临床数据摄入i2b2。
medRxiv. 2025 Apr 19:2025.04.17.25325962. doi: 10.1101/2025.04.17.25325962.
4
Two subtle problems with overrepresentation analysis.过度代表性分析存在的两个细微问题。
Bioinform Adv. 2024 Oct 21;4(1):vbae159. doi: 10.1093/bioadv/vbae159. eCollection 2024.
5
What is the real value of omics data? Enhancing research outcomes and securing long-term data excellence.组学数据的真正价值是什么?提升研究成果,确保数据长期卓越。
Nucleic Acids Res. 2024 Nov 11;52(20):12130-12140. doi: 10.1093/nar/gkae901.
6
The five pillars of computational reproducibility: bioinformatics and beyond.计算可重复性的五个支柱:生物信息学及其他。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad375.
7
Metadata integrity in bioinformatics: Bridging the gap between data and knowledge.生物信息学中的元数据完整性:弥合数据与知识之间的差距。
Comput Struct Biotechnol J. 2023 Oct 5;21:4895-4913. doi: 10.1016/j.csbj.2023.10.006. eCollection 2023.
8
A technical guide to TRITEX, a computational pipeline for chromosome-scale sequence assembly of plant genomes.TRITEX技术指南,一种用于植物基因组染色体尺度序列组装的计算流程。
Plant Methods. 2022 Dec 2;18(1):128. doi: 10.1186/s13007-022-00964-1.
9
Towards comprehensive integration and curation of chloroplast genomes.迈向叶绿体基因组的全面整合与管理
Plant Biotechnol J. 2022 Dec;20(12):2239-2241. doi: 10.1111/pbi.13923. Epub 2022 Sep 20.
10
Sharing Begins at Home: How Continuous and Ubiquitous FAIRness Can Enhance Research Productivity and Data Reuse.共享始于家庭:持续且无处不在的FAIR性如何提高研究生产力和数据复用率。
Harv Data Sci Rev. 2022 Summer;4(3). doi: 10.1162/99608f92.44d21b86. Epub 2022 Jul 28.
Front Hum Neurosci. 2018 Feb 20;12:37. doi: 10.3389/fnhum.2018.00037. eCollection 2018.
4
Escape Excel: A tool for preventing gene symbol and accession conversion errors.逃离Excel:一种防止基因符号和登录号转换错误的工具。
PLoS One. 2017 Sep 27;12(9):e0185207. doi: 10.1371/journal.pone.0185207. eCollection 2017.
5
Truke, a web tool to check for and handle excel misidentified gene symbols.Truke,一个用于检查和处理Excel中错误识别的基因符号的网络工具。
BMC Genomics. 2017 Mar 21;18(1):242. doi: 10.1186/s12864-017-3631-8.
6
Legible ledgers.
Nat Genet. 2016 Sep 28;48(10):1101. doi: 10.1038/ng.3690.
7
Gene name errors are widespread in the scientific literature.基因名称错误在科学文献中广泛存在。
Genome Biol. 2016 Aug 23;17(1):177. doi: 10.1186/s13059-016-1044-7.
8
Error rates in a clinical data repository: lessons from the transition to electronic data transfer--a descriptive study.临床数据存储库中的错误率:电子数据传输过渡中的经验教训——描述性研究。
BMJ Open. 2013 May 28;3(5):e002406. doi: 10.1136/bmjopen-2012-002406.
9
Reproducible research in computational science.计算科学中的可重复性研究。
Science. 2011 Dec 2;334(6060):1226-7. doi: 10.1126/science.1213847.
10
Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics.错误的标识符:在生物信息学中使用Excel时,基因名称错误可能会无意中被引入。
BMC Bioinformatics. 2004 Jun 23;5:80. doi: 10.1186/1471-2105-5-80.