我们在基因组的何处？基因组学研究中数据库使用的一个警示故事。

Where in the genome are we? A cautionary tale of database use in genomics research.

机构信息

Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA.

出版信息

Front Genet. 2013 Mar 21;4:38. doi: 10.3389/fgene.2013.00038. eCollection 2013.

DOI:10.3389/fgene.2013.00038

PMID:23519237

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3604632/

Abstract

With the advent of high throughput data genomic technologies the volume of available data is now staggering. In addition databases that provide resources to annotate, translate, and connect biological data have grown exponentially in content and use. The availability of such data emphasizes the importance of bioinformatics and computational biology in genomics research and has led to the development of thousands of tools to integrate and utilize these resources. When utilizing such resources, the principles of reproducible research are often overlooked. In this manuscript we provide selected case studies illustrating issues that may arise while working with genes and genetic polymorphisms. These case studies illustrate potential sources of error which can be introduced if the practices of reproducible research are not employed and non-concurrent databases are used. We also show examples of a lack of transparency when these databases are concerned when using popular bioinformatics tools. These examples highlight that resources are constantly evolving, and in order to provide reproducible results, research should be aware of and connected to the correct release of the data, particularly when implementing computational tools.

摘要

随着高通量数据基因组技术的出现，现在可用数据的数量令人震惊。此外，提供注释、翻译和连接生物数据资源的数据库在内容和使用方面呈指数级增长。这些数据的可用性强调了生物信息学和计算生物学在基因组学研究中的重要性，并导致了数千种工具的开发，以整合和利用这些资源。在利用这些资源时，通常会忽略可重复性研究的原则。在本文中，我们提供了一些选定的案例研究，说明了在使用基因和遗传多态性时可能出现的问题。这些案例研究说明了如果不采用可重复性研究的实践并使用非并发数据库，可能会引入潜在的错误源。我们还展示了在使用流行的生物信息学工具时，这些数据库存在缺乏透明度的例子。这些例子强调了资源在不断发展，如果要提供可重复的结果，研究应该意识到并连接到数据的正确版本，特别是在实施计算工具时。