Kumar Sudhir, Dudley Joel
Center for Evolutionary Functional Genomics, The Biodesign Institute and School of Life Sciences, Arizona State University, Tempe, Arizona 85287-5301, USA.
Bioinformatics. 2007 Jul 15;23(14):1713-7. doi: 10.1093/bioinformatics/btm239. Epub 2007 May 7.
The genome sequencing revolution is approaching a landmark figure of 1000 completely sequenced genomes. Coupled with fast-declining, per-base sequencing costs, this influx of DNA sequence data has encouraged laboratory scientists to engage large datasets in comparative sequence analyses for making evolutionary, functional and translational inferences. However, the majority of the scientists at the forefront of experimental research are not bioinformaticians, so a gap exists between the user-friendly software needed and the scripting/programming infrastructure often employed for the analysis of large numbers of genes, long genomic segments and groups of sequences. We see an urgent need for the expansion of the fundamental paradigms under which biologist-friendly software tools are designed and developed to fulfill the needs of biologists to analyze large datasets by using sophisticated computational methods. We argue that the design principles need to be sensitive to the reality that comparatively small teams of biologists have historically developed some of the most popular biological software packages in molecular evolutionary analysis. Furthermore, biological intuitiveness and investigator empowerment need to take precedence over the current supposition that biologists should re-tool and become programmers when analyzing genome scale datasets.
基因组测序革命正接近一个具有里程碑意义的数字——1000个已完全测序的基因组。随着每碱基测序成本的迅速下降,DNA序列数据的大量涌入促使实验室科学家利用大型数据集进行比较序列分析,以做出进化、功能和转化方面的推断。然而,处于实验研究前沿的大多数科学家并非生物信息学家,因此在所需的用户友好型软件与常用于分析大量基因、长基因组片段和序列组的脚本/编程基础设施之间存在差距。我们迫切需要扩展基本范式,在此范式下设计和开发对生物学家友好的软件工具,以满足生物学家使用复杂计算方法分析大型数据集的需求。我们认为,设计原则需要考虑到这样一个现实,即相对较小的生物学家团队历来开发了分子进化分析中一些最受欢迎的生物软件包。此外,生物学直观性和研究者自主性应优先于当前的一种假设,即生物学家在分析基因组规模数据集时应重新学习并成为程序员。