Chelala Claude, Hahn Stephan A, Whiteman Hannah J, Barry Sayka, Hariharan Deepak, Radon Tomasz P, Lemoine Nicholas R, Crnogorac-Jurcevic Tatjana
Centre for Molecular Oncology, Institute of Cancer & CR-UK Clinical Centre, Barts & The London School of Medicine (QMUL), Charterhouse Square London EC1M 6BQ, UK.
BMC Genomics. 2007 Nov 28;8:439. doi: 10.1186/1471-2164-8-439.
Pancreatic cancer is the 5th leading cause of cancer death in both males and females. In recent years, a wealth of gene and protein expression studies have been published broadening our understanding of pancreatic cancer biology. Due to the explosive growth in publicly available data from multiple different sources it is becoming increasingly difficult for individual researchers to integrate these into their current research programmes. The Pancreatic Expression database, a generic web-based system, is aiming to close this gap by providing the research community with an open access tool, not only to mine currently available pancreatic cancer data sets but also to include their own data in the database.
Currently, the database holds 32 datasets comprising 7636 gene expression measurements extracted from 20 different published gene or protein expression studies from various pancreatic cancer types, pancreatic precursor lesions (PanINs) and chronic pancreatitis. The pancreatic data are stored in a data management system based on the BioMart technology alongside the human genome gene and protein annotations, sequence, homologue, SNP and antibody data. Interrogation of the database can be achieved through both a web-based query interface and through web services using combined criteria from pancreatic (disease stages, regulation, differential expression, expression, platform technology, publication) and/or public data (antibodies, genomic region, gene-related accessions, ontology, expression patterns, multi-species comparisons, protein data, SNPs). Thus, our database enables connections between otherwise disparate data sources and allows relatively simple navigation between all data types and annotations.
The database structure and content provides a powerful and high-speed data-mining tool for cancer research. It can be used for target discovery i.e. of biomarkers from body fluids, identification and analysis of genes associated with the progression of cancer, cross-platform meta-analysis, SNP selection for pancreatic cancer association studies, cancer gene promoter analysis as well as mining cancer ontology information. The data model is generic and can be easily extended and applied to other types of cancer. The database is available online with no restrictions for the scientific community at http://www.pancreasexpression.org/.
胰腺癌是男性和女性癌症死亡的第五大主要原因。近年来,大量基因和蛋白质表达研究得以发表,拓宽了我们对胰腺癌生物学的认识。由于来自多个不同来源的公开可用数据呈爆炸式增长,单个研究人员将这些数据整合到其当前研究项目中变得越来越困难。胰腺表达数据库是一个基于网络的通用系统,旨在通过为研究界提供一个开放获取工具来填补这一空白,该工具不仅可用于挖掘当前可用的胰腺癌数据集,还可将他们自己的数据纳入数据库。
目前,该数据库包含32个数据集,由从20项不同的已发表基因或蛋白质表达研究中提取的7636个基因表达测量值组成,这些研究涉及各种胰腺癌类型、胰腺前驱病变(胰腺上皮内瘤变)和慢性胰腺炎。胰腺数据与人类基因组基因和蛋白质注释、序列、同源物、单核苷酸多态性(SNP)和抗体数据一起存储在基于生物集市(BioMart)技术的数据管理系统中。通过基于网络的查询界面以及使用来自胰腺(疾病阶段、调控、差异表达、表达、平台技术、出版物)和/或公共数据(抗体、基因组区域、基因相关登录号、本体、表达模式、多物种比较、蛋白质数据、SNP)的组合标准的网络服务,均可实现对数据库的查询。因此,我们的数据库能够在原本分散的数据来源之间建立联系,并允许在所有数据类型和注释之间进行相对简单的导航。
该数据库的结构和内容为癌症研究提供了一个强大且高速的数据挖掘工具。它可用于靶点发现,即从体液中发现生物标志物、识别和分析与癌症进展相关的基因、跨平台荟萃分析、胰腺癌关联研究的SNP选择、癌症基因启动子分析以及挖掘癌症本体信息。该数据模型具有通用性,可轻松扩展并应用于其他类型的癌症。该数据库可在http://www.pancreasexpression.org/在线获取,科学界可无限制使用。