Blankenberg Daniel, Taylor James, Schenck Ian, He Jianbin, Zhang Yi, Ghent Matthew, Veeraraghavan Narayanan, Albert Istvan, Miller Webb, Makova Kateryna D, Hardison Ross C, Nekrutenko Anton
Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA.
Genome Res. 2007 Jun;17(6):960-4. doi: 10.1101/gr.5578007.
The standardization and sharing of data and tools are the biggest challenges of large collaborative projects such as the Encyclopedia of DNA Elements (ENCODE). Here we describe a compact Web application, Galaxy2(ENCODE), that effectively addresses these issues. It provides an intuitive interface for the deposition and access of data, and features a vast number of analysis tools including operations on genomic intervals, utilities for manipulation of multiple sequence alignments, and molecular evolution algorithms. By providing a direct link between data and analysis tools, Galaxy2(ENCODE) allows addressing biological questions that are beyond the reach of existing software. We use Galaxy2(ENCODE) to show that the ENCODE regions contain >2000 unannotated transcripts under strong purifying selection that are likely functional. We also show that the ENCODE regions are representative of the entire genome by estimating the rate of nucleotide substitution and comparing it to published data. Although each of these analyses is complex, none takes more than 15 min from beginning to end. Finally, we demonstrate how new tools can be added to Galaxy2(ENCODE) with almost no effort. Every section of the manuscript is supplemented with QuickTime screencasts. Galaxy2(ENCODE) and the screencasts can be accessed at http://g2.bx.psu.edu.
数据和工具的标准化与共享是诸如DNA元件百科全书(ENCODE)等大型合作项目面临的最大挑战。在此,我们描述了一个精简的网络应用程序Galaxy2(ENCODE),它有效地解决了这些问题。它为数据的存储与访问提供了直观的界面,并具有大量分析工具,包括基因组区间操作、多序列比对处理实用工具以及分子进化算法。通过在数据和分析工具之间建立直接联系,Galaxy2(ENCODE)使得解决现有软件难以企及的生物学问题成为可能。我们使用Galaxy2(ENCODE)表明,ENCODE区域包含2000多个处于强纯化选择下的未注释转录本,这些转录本可能具有功能。我们还通过估计核苷酸替换率并将其与已发表数据进行比较,表明ENCODE区域代表了整个基因组。尽管这些分析中的每一项都很复杂,但从始至终没有一项超过15分钟。最后,我们展示了几乎不费吹灰之力就能将新工具添加到Galaxy2(ENCODE)中。手稿的每个部分都配有QuickTime屏幕录像。可通过http://g2.bx.psu.edu访问Galaxy2(ENCODE)和屏幕录像。