Huang Taoying, Shenoy Pareen J, Sinha Rajni, Graiser Michael, Bumpers Kevin W, Flowers Christopher R
Winship Cancer Institute, School of Medicine, Emory University, Atlanta, GA, USA.
Cancer Inform. 2009 Apr 3;8:45-64. doi: 10.4137/cin.s940.
Lymphomas are the fifth most common cancer in United States with numerous histological subtypes. Integrating existing clinical information on lymphoma patients provides a platform for understanding biological variability in presentation and treatment response and aids development of novel therapies. We developed a cancer Biomedical Informatics Grid (caBIG) Silver level compliant lymphoma database, called the Lymphoma Enterprise Architecture Data-system (LEAD), which integrates the pathology, pharmacy, laboratory, cancer registry, clinical trials, and clinical data from institutional databases. We utilized the Cancer Common Ontological Representation Environment Software Development Kit (caCORE SDK) provided by National Cancer Institute's Center for Bioinformatics to establish the LEAD platform for data management. The caCORE SDK generated system utilizes an n-tier architecture with open Application Programming Interfaces, controlled vocabularies, and registered metadata to achieve semantic integration across multiple cancer databases. We demonstrated that the data elements and structures within LEAD could be used to manage clinical research data from phase 1 clinical trials, cohort studies, and registry data from the Surveillance Epidemiology and End Results database. This work provides a clear example of how semantic technologies from caBIG can be applied to support a wide range of clinical and research tasks, and integrate data from disparate systems into a single architecture. This illustrates the central importance of caBIG to the management of clinical and biological data.
淋巴瘤是美国第五大常见癌症,具有多种组织学亚型。整合淋巴瘤患者的现有临床信息为理解疾病表现和治疗反应中的生物学变异性提供了一个平台,并有助于新型疗法的开发。我们开发了一个符合癌症生物医学信息网格(caBIG)银级标准的淋巴瘤数据库,称为淋巴瘤企业架构数据系统(LEAD),该数据库整合了来自机构数据库的病理学、药学、实验室、癌症登记、临床试验和临床数据。我们利用美国国立癌症研究所生物信息学中心提供的癌症通用本体表示环境软件开发工具包(caCORE SDK)来建立用于数据管理的LEAD平台。caCORE SDK生成的系统采用n层架构,具有开放的应用程序编程接口、受控词汇表和注册元数据,以实现跨多个癌症数据库的语义整合。我们证明,LEAD中的数据元素和结构可用于管理来自1期临床试验、队列研究的临床研究数据,以及来自监测、流行病学和最终结果数据库的登记数据。这项工作清楚地展示了caBIG的语义技术如何能够应用于支持广泛的临床和研究任务,并将来自不同系统的数据整合到一个单一架构中。这说明了caBIG对临床和生物学数据管理的核心重要性。