Wu Tsung-Jung, Schriml Lynn M, Chen Qing-Rong, Colbert Maureen, Crichton Daniel J, Finney Richard, Hu Ying, Kibbe Warren A, Kincaid Heather, Meerzaman Daoud, Mitraka Elvira, Pan Yang, Smith Krista M, Srivastava Sudhir, Ward Sari, Yan Cheng, Mazumder Raja
Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
Database (Oxford). 2015 Apr 4;2015:bav032. doi: 10.1093/database/bav032. Print 2015.
Bio-ontologies provide terminologies for the scientific community to describe biomedical entities in a standardized manner. There are multiple initiatives that are developing biomedical terminologies for the purpose of providing better annotation, data integration and mining capabilities. Terminology resources devised for multiple purposes inherently diverge in content and structure. A major issue of biomedical data integration is the development of overlapping terms, ambiguous classifications and inconsistencies represented across databases and publications. The disease ontology (DO) was developed over the past decade to address data integration, standardization and annotation issues for human disease data. We have established a DO cancer project to be a focused view of cancer terms within the DO. The DO cancer project mapped 386 cancer terms from the Catalogue of Somatic Mutations in Cancer (COSMIC), The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium, Therapeutically Applicable Research to Generate Effective Treatments, Integrative Oncogenomics and the Early Detection Research Network into a cohesive set of 187 DO terms represented by 63 top-level DO cancer terms. For example, the COSMIC term 'kidney, NS, carcinoma, clear_cell_renal_cell_carcinoma' and TCGA term 'Kidney renal clear cell carcinoma' were both grouped to the term 'Disease Ontology Identification (DOID):4467 / renal clear cell carcinoma' which was mapped to the TopNodes_DOcancerslim term 'DOID:263 / kidney cancer'. Mapping of diverse cancer terms to DO and the use of top level terms (DO slims) will enable pan-cancer analysis across datasets generated from any of the cancer term sources where pan-cancer means including or relating to all or multiple types of cancer. The terms can be browsed from the DO web site (http://www.disease-ontology.org) and downloaded from the DO's Apache Subversion or GitHub repositories. Database URL: http://www.disease-ontology.org
生物本体为科学界提供术语,以便以标准化方式描述生物医学实体。有多项举措正在开发生物医学术语,目的是提供更好的注释、数据整合和挖掘能力。为多种目的设计的术语资源在内容和结构上本质上存在差异。生物医学数据整合的一个主要问题是跨数据库和出版物出现重叠术语、模糊分类和不一致性。疾病本体(DO)在过去十年中得到发展,以解决人类疾病数据的数据整合、标准化和注释问题。我们已经建立了一个DO癌症项目,作为对DO中癌症术语的重点视图。DO癌症项目将来自《癌症体细胞突变目录》(COSMIC)、《癌症基因组图谱》(TCGA)、国际癌症基因组联盟、生成有效治疗方法的治疗应用研究、综合肿瘤基因组学和早期检测研究网络的386个癌症术语映射到由63个顶级DO癌症术语表示的187个连贯的DO术语集合中。例如,COSMIC术语“肾,NS,癌,透明细胞肾细胞癌”和TCGA术语“肾透明细胞癌”都被归为术语“疾病本体识别(DOID):4467 / 肾透明细胞癌”,该术语被映射到顶级节点_DO癌症精简术语“DOID:263 / 肾癌”。将不同的癌症术语映射到DO并使用顶级术语(DO精简版)将能够对来自任何癌症术语来源生成的数据集进行泛癌分析,其中泛癌意味着包括或涉及所有或多种类型的癌症。这些术语可以从DO网站(http://www.disease-ontology.org)浏览,并从DO的Apache Subversion或GitHub存储库下载。数据库网址:http://www.disease-ontology.org