Jameson Daniel, Garwood Kevin, Garwood Chris, Booth Tim, Alper Pinar, Oliver Stephen G, Paton Norman W
School of Chemistry, Manchester Interdisciplinary Biocentre, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK.
BMC Bioinformatics. 2008 Apr 10;9:183. doi: 10.1186/1471-2105-9-183.
The systematic capture of appropriately annotated experimental data is a prerequisite for most bioinformatics analyses. Data capture is required not only for submission of data to public repositories, but also to underpin integrated analysis, archiving, and sharing - both within laboratories and in collaborative projects. The widespread requirement to capture data means that data capture and annotation are taking place at many sites, but the small scale of the literature on tools, techniques and experiences suggests that there is work to be done to identify good practice and reduce duplication of effort.
This paper reports on experience gained in the deployment of the Pedro data capture tool in a range of representative bioinformatics applications. The paper makes explicit the requirements that have recurred when capturing data in different contexts, indicates how these requirements are addressed in Pedro, and describes case studies that illustrate where the requirements have arisen in practice.
Data capture is a fundamental activity for bioinformatics; all biological data resources build on some form of data capture activity, and many require a blend of import, analysis and annotation. Recurring requirements in data capture suggest that model-driven architectures can be used to construct data capture infrastructures that can be rapidly configured to meet the needs of individual use cases. We have described how one such model-driven infrastructure, namely Pedro, has been deployed in representative case studies, and discussed the extent to which the model-driven approach has been effective in practice.
系统地捕获经过适当注释的实验数据是大多数生物信息学分析的先决条件。数据捕获不仅是将数据提交到公共存储库所必需的,也是支持实验室内部以及合作项目中的综合分析、存档和共享所必需的。广泛的数据捕获需求意味着数据捕获和注释正在许多地方进行,但关于工具、技术和经验的文献规模较小,这表明在确定良好实践和减少重复工作方面仍有工作要做。
本文报告了在一系列具有代表性的生物信息学应用中部署佩德罗数据捕获工具所获得的经验。本文明确了在不同背景下捕获数据时反复出现的要求,指出了佩德罗如何满足这些要求,并描述了案例研究来说明这些要求在实际中出现的情况。
数据捕获是生物信息学的一项基础活动;所有生物数据资源都建立在某种形式的数据捕获活动之上,并且许多资源需要导入、分析和注释的结合。数据捕获中反复出现的要求表明,模型驱动的架构可用于构建数据捕获基础设施,这些基础设施可以快速配置以满足各个用例的需求。我们已经描述了一个这样的模型驱动基础设施,即佩德罗,如何在具有代表性的案例研究中得到部署,并讨论了模型驱动方法在实际中有效的程度。