Deus Helena F, Stanislaus Romesh, Veiga Diogo F, Behrens Carmen, Wistuba Ignacio I, Minna John D, Garner Harold R, Swisher Stephen G, Roth Jack A, Correa Arlene M, Broom Bradley, Coombes Kevin, Chang Allen, Vogel Lynn H, Almeida Jonas S
Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America.
PLoS One. 2008 Aug 13;3(8):e2946. doi: 10.1371/journal.pone.0002946.
Data, data everywhere. The diversity and magnitude of the data generated in the Life Sciences defies automated articulation among complementary efforts. The additional need in this field for managing property and access permissions compounds the difficulty very significantly. This is particularly the case when the integration involves multiple domains and disciplines, even more so when it includes clinical and high throughput molecular data.
METHODOLOGY/PRINCIPAL FINDINGS: The emergence of Semantic Web technologies brings the promise of meaningful interoperation between data and analysis resources. In this report we identify a core model for biomedical Knowledge Engineering applications and demonstrate how this new technology can be used to weave a management model where multiple intertwined data structures can be hosted and managed by multiple authorities in a distributed management infrastructure. Specifically, the demonstration is performed by linking data sources associated with the Lung Cancer SPORE awarded to The University of Texas MD Anderson Cancer Center at Houston and the Southwestern Medical Center at Dallas. A software prototype, available with open source at www.s3db.org, was developed and its proposed design has been made publicly available as an open source instrument for shared, distributed data management.
CONCLUSIONS/SIGNIFICANCE: The Semantic Web technologies have the potential to addresses the need for distributed and evolvable representations that are critical for systems Biology and translational biomedical research. As this technology is incorporated into application development we can expect that both general purpose productivity software and domain specific software installed on our personal computers will become increasingly integrated with the relevant remote resources. In this scenario, the acquisition of a new dataset should automatically trigger the delegation of its analysis.
数据,无处不在的数据。生命科学领域所产生数据的多样性和规模使得互补性研究之间难以实现自动化衔接。该领域对管理所有权和访问权限的额外需求极大地增加了难度。当整合涉及多个领域和学科时尤其如此,若包含临床和高通量分子数据则更是困难重重。
方法/主要发现:语义网技术的出现为数据与分析资源之间有意义的互操作带来了希望。在本报告中,我们确定了生物医学知识工程应用的核心模型,并展示了如何利用这项新技术构建一个管理模型,在分布式管理架构中,多个相互交织的数据结构能够由多个机构进行托管和管理。具体而言,通过链接与授予休斯顿德克萨斯大学MD安德森癌症中心和达拉斯西南医学中心的肺癌专项研究(SPORE)相关的数据源来进行演示。开发了一个软件原型,可在www.s3db.org上获取开源版本,其设计方案已作为共享分布式数据管理的开源工具公开。
结论/意义:语义网技术有潜力满足对分布式和可进化表示的需求,这对系统生物学和转化生物医学研究至关重要。随着这项技术融入应用开发,我们可以预期安装在个人电脑上的通用生产力软件和特定领域软件将越来越多地与相关远程资源整合。在这种情况下,获取新数据集应能自动触发对其分析的委托。