Suppr超能文献

生成式人工智能时代的生物数据库。

Biological databases in the age of generative artificial intelligence.

作者信息

Pop Mihai, Attwood Teresa K, Blake Judith A, Bourne Philip E, Conesa Ana, Gaasterland Terry, Hunter Lawrence, Kingsford Carl, Kohlbacher Oliver, Lengauer Thomas, Markel Scott, Moreau Yves, Noble William S, Orengo Christine, Ouellette B F Francis, Parida Laxmi, Przulj Natasa, Przytycka Teresa M, Ranganathan Shoba, Schwartz Russell, Valencia Alfonso, Warnow Tandy

机构信息

Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, United States.

Department of Computer Science, The University of Manchester, Manchester M13 9PL, United Kingdom.

出版信息

Bioinform Adv. 2025 Mar 20;5(1):vbaf044. doi: 10.1093/bioadv/vbaf044. eCollection 2025.

Abstract

SUMMARY

Modern biological research critically depends on public databases. The introduction and propagation of errors within and across databases can lead to wasted resources as scientists are led astray by bad data or have to conduct expensive validation experiments. The emergence of generative artificial intelligence systems threatens to compound this problem owing to the ease with which massive volumes of synthetic data can be generated. We provide an overview of several key issues that occur within the biological data ecosystem and make several recommendations aimed at reducing data errors and their propagation. We specifically highlight the critical importance of improved educational programs aimed at biologists and life scientists that emphasize best practices in data engineering. We also argue for increased theoretical and empirical research on data provenance, error propagation, and on understanding the impact of errors on analytic pipelines. Furthermore, we recommend enhanced funding for the stewardship and maintenance of public biological databases.

AVAILABILITY AND IMPLEMENTATION

Not applicable.

摘要

摘要

现代生物学研究严重依赖公共数据库。数据库内部和之间错误的引入与传播可能导致资源浪费,因为科学家会被错误数据误导,或者不得不进行昂贵的验证实验。生成式人工智能系统的出现可能会使这个问题更加严重,因为生成大量合成数据非常容易。我们概述了生物数据生态系统中出现的几个关键问题,并提出了一些旨在减少数据错误及其传播的建议。我们特别强调了改进针对生物学家和生命科学家的教育项目的至关重要性,这些项目应强调数据工程的最佳实践。我们还主张增加对数据来源、错误传播以及理解错误对分析管道影响的理论和实证研究。此外,我们建议增加对公共生物数据库管理和维护的资金投入。

可用性与实施

不适用。

相似文献

1
Biological databases in the age of generative artificial intelligence.生成式人工智能时代的生物数据库。
Bioinform Adv. 2025 Mar 20;5(1):vbaf044. doi: 10.1093/bioadv/vbaf044. eCollection 2025.

本文引用的文献

3
Ten simple rules to make computable knowledge shareable and reusable.使可计算知识具有可分享性和可重用性的 10 条简单规则。
PLoS Comput Biol. 2024 Jun 20;20(6):e1012179. doi: 10.1371/journal.pcbi.1012179. eCollection 2024 Jun.
4
The impact of transitive annotation on the training of taxonomic classifiers.传递注释对分类学分类器训练的影响。
Front Microbiol. 2024 Jan 3;14:1240957. doi: 10.3389/fmicb.2023.1240957. eCollection 2023.
5
Grand challenges in bioinformatics education and training.生物信息学教育与培训中的重大挑战。
Nat Biotechnol. 2023 Aug;41(8):1171-1174. doi: 10.1038/s41587-023-01891-9.
6
Synthetic data in health care: A narrative review.医疗保健中的合成数据:一篇叙述性综述。
PLOS Digit Health. 2023 Jan 6;2(1):e0000082. doi: 10.1371/journal.pdig.0000082. eCollection 2023 Jan.
9
GenBank 2023 update.GenBank 2023 更新。
Nucleic Acids Res. 2023 Jan 6;51(D1):D141-D144. doi: 10.1093/nar/gkac1012.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验