Fogh Rasmus H, Boucher Wayne, Ionides John M C, Vranken Wim F, Stevens Tim J, Laue Ernest D
Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK.
J Integr Bioinform. 2010 Mar 25;7(3):475. doi: 10.2390/biecoll-jib-2010-123.
In recent years the amount of biological data has exploded to the point where much useful information can only be extracted by complex computational analyses. Such analyses are greatly facilitated by metadata standards, both in terms of the ability to compare data originating from different sources, and in terms of exchanging data in standard forms, e.g. when running processes on a distributed computing infrastructure. However, standards thrive on stability whereas science tends to constantly move, with new methods being developed and old ones modified. Therefore maintaining both metadata standards, and all the code that is required to make them useful, is a non-trivial problem. Memops is a framework that uses an abstract definition of the metadata (described in UML) to generate internal data structures and subroutine libraries for data access (application programming interfaces--APIs--currently in Python, C and Java) and data storage (in XML files or databases). For the individual project these libraries obviate the need for writing code for input parsing, validity checking or output. Memops also ensures that the code is always internally consistent, massively reducing the need for code reorganisation. Across a scientific domain a Memops-supported data model makes it easier to support complex standards that can capture all the data produced in a scientific area, share them among all programs in a complex software pipeline, and carry them forward to deposition in an archive. The principles behind the Memops generation code will be presented, along with example applications in Nuclear Magnetic Resonance (NMR) spectroscopy and structural biology.
近年来,生物数据量呈爆炸式增长,以至于许多有用信息只能通过复杂的计算分析才能提取。元数据标准极大地促进了此类分析,这体现在两个方面:一是能够比较来自不同来源的数据,二是以标准形式交换数据,例如在分布式计算基础设施上运行程序时。然而,标准依赖于稳定性,而科学往往不断发展,新方法不断涌现,旧方法也不断改进。因此,维护元数据标准以及使其发挥作用所需的所有代码,是一个并非轻而易举的问题。Memops是一个框架,它使用元数据的抽象定义(用UML描述)来生成内部数据结构和用于数据访问(目前有Python、C和Java的应用程序编程接口——APIs)及数据存储(在XML文件或数据库中)的子程序库。对于单个项目而言,这些库消除了编写输入解析、有效性检查或输出代码的需求。Memops还确保代码始终在内部保持一致,大幅减少了代码重组的需求。在一个科学领域中,受Memops支持的数据模型使得支持复杂标准变得更加容易,这些标准能够捕获一个科学领域产生的所有数据,在复杂软件管道中的所有程序之间共享这些数据,并将它们推进到存档中。将介绍Memops生成代码背后的原理,以及在核磁共振(NMR)光谱学和结构生物学中的示例应用。