CyberInfrastructure Section, Virginia Bioinformatics Institute, Washington Street, MC 0477, Virginia Tech, Blacksburg, Virginia 24061, USA.
Chem Biodivers. 2010 May;7(5):1124-41. doi: 10.1002/cbdv.200900317.
Systems-biology and infectious-disease (host-pathogen-environment) research and development is becoming increasingly dependent on integrating data from diverse and dynamic sources. Maintaining integrated resources over long periods of time presents distinct challenges. This review describes experiences and lessons learned from integrating data in two five-year projects focused on pathosystems biology: the Pathosystems Resource Integration Center (PATRIC, http://patric.vbi.vt.edu/), with a goal of developing bioinformatics resources for the research and countermeasures-development communities based on genomics data, and the Resource Center for Biodefense Proteomics Research (RCBPR, http://www.proteomicsresource.org/), with a goal of developing resources based on the experiment data such as microarray and proteomics data from diverse sources and technologies. Some challenges include integrating genomic sequence and experiment data, data synchronization, data quality control, and usability engineering. We present examples of a variety of data-integration problems drawn from our experiences with PATRIC and RBPRC, as well as open research questions related to long-term sustainability, and describe the next steps to meeting these challenges. Novel contributions of this work include 1) an approach for addressing discrepancies between experiment results and interpreted results, and 2) expanding the range of data-integration techniques to include usability engineering at the presentation level.
系统生物学和传染病(宿主-病原体-环境)的研究和开发越来越依赖于整合来自不同和动态来源的数据。长期维护集成资源带来了明显的挑战。本综述描述了在两个专注于病理系统生物学的五年项目中整合数据的经验和教训:病理系统资源整合中心(PATRIC,http://patric.vbi.vt.edu/),其目标是基于基因组学数据为研究和对策开发生物信息学资源社区,以及生物防御蛋白质组学资源中心(RCBPR,http://www.proteomicsresource.org/),其目标是基于来自不同来源和技术的微阵列和蛋白质组学数据等实验数据开发资源。一些挑战包括整合基因组序列和实验数据、数据同步、数据质量控制和可用性工程。我们从 PATRIC 和 RCBPRC 的经验中,以及与长期可持续性相关的开放研究问题中,提出了各种数据集成问题的示例,并描述了应对这些挑战的下一步措施。这项工作的新贡献包括 1)解决实验结果和解释结果之间差异的方法,以及 2)扩展数据集成技术的范围,包括在演示层面上的可用性工程。