Zhang Guang Lan, Riemer Angelika B, Keskin Derin B, Chitkushev Lou, Reinherz Ellis L, Brusic Vladimir
Cancer Vaccine Center, Dana-Farber Cancer Institute, 77 Ave Louis Pasteur, Boston, MA 02115, USA, Department of Computer Science, Metropolitan College, Boston University, 808 Commonwealth Ave, Boston, MA 02215, USA, Department of Medicine, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA and German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany.
Database (Oxford). 2014 Apr 4;2014:bau031. doi: 10.1093/database/bau031. Print 2014.
High-risk human papillomaviruses (HPVs) are the causes of many cancers, including cervical, anal, vulvar, vaginal, penile and oropharyngeal. To facilitate diagnosis, prognosis and characterization of these cancers, it is necessary to make full use of the immunological data on HPV available through publications, technical reports and databases. These data vary in granularity, quality and complexity. The extraction of knowledge from the vast amount of immunological data using data mining techniques remains a challenging task. To support integration of data and knowledge in virology and vaccinology, we developed a framework called KB-builder to streamline the development and deployment of web-accessible immunological knowledge systems. The framework consists of seven major functional modules, each facilitating a specific aspect of the knowledgebase construction process. Using KB-builder, we constructed the Human Papillomavirus T cell Antigen Database (HPVdb). It contains 2781 curated antigen entries of antigenic proteins derived from 18 genotypes of high-risk HPV and 18 genotypes of low-risk HPV. The HPVdb also catalogs 191 verified T cell epitopes and 45 verified human leukocyte antigen (HLA) ligands. Primary amino acid sequences of HPV antigens were collected and annotated from the UniProtKB. T cell epitopes and HLA ligands were collected from data mining of scientific literature and databases. The data were subject to extensive quality control (redundancy elimination, error detection and vocabulary consolidation). A set of computational tools for an in-depth analysis, such as sequence comparison using BLAST search, multiple alignments of antigens, classification of HPV types based on cancer risk, T cell epitope/HLA ligand visualization, T cell epitope/HLA ligand conservation analysis and sequence variability analysis, has been integrated within the HPVdb. Predicted Class I and Class II HLA binding peptides for 15 common HLA alleles are included in this database as putative targets. HPVdb is a knowledge-based system that integrates curated data and information with tailored analysis tools to facilitate data mining for HPV vaccinology and immunology. To our best knowledge, HPVdb is a unique data source providing a comprehensive list of HPV antigens and peptides. Database URL: http://cvc.dfci.harvard.edu/hpv/.
高危型人乳头瘤病毒(HPV)是包括宫颈癌、肛门癌、外阴癌、阴道癌、阴茎癌和口咽癌在内的多种癌症的病因。为了便于对这些癌症进行诊断、预后评估和特征描述,有必要充分利用通过出版物、技术报告和数据库获得的关于HPV的免疫学数据。这些数据在粒度、质量和复杂性方面各不相同。使用数据挖掘技术从大量免疫学数据中提取知识仍然是一项具有挑战性的任务。为了支持病毒学和疫苗学中数据与知识的整合,我们开发了一个名为KB-builder的框架,以简化可通过网络访问的免疫学知识系统的开发和部署。该框架由七个主要功能模块组成,每个模块都有助于知识库构建过程的特定方面。使用KB-builder,我们构建了人乳头瘤病毒T细胞抗原数据库(HPVdb)。它包含2781个经过整理的抗原条目,这些抗原蛋白来源于18种高危型HPV基因型和18种低危型HPV基因型。HPVdb还编目了191个经过验证的T细胞表位和45个经过验证的人类白细胞抗原(HLA)配体。HPV抗原的一级氨基酸序列是从UniProtKB收集并注释的。T细胞表位和HLA配体是从科学文献和数据库的数据挖掘中收集的。这些数据经过了广泛的质量控制(消除冗余、错误检测和词汇整合)。一组用于深入分析的计算工具,如使用BLAST搜索进行序列比较、抗原的多序列比对、基于癌症风险的HPV类型分类、T细胞表位/HLA配体可视化、T细胞表位/HLA配体保守性分析和序列变异性分析,已集成到HPVdb中。该数据库包含了针对15种常见HLA等位基因预测的I类和II类HLA结合肽,作为假定靶点。HPVdb是一个基于知识的系统,它将经过整理的数据和信息与定制的分析工具相结合,以促进HPV疫苗学和免疫学的数据挖掘。据我们所知,HPVdb是一个独特的数据源,提供了HPV抗原和肽的全面列表。数据库网址:http://cvc.dfci.harvard.edu/hpv/