Olivier M, Petitjean A, Teague J, Forbes S, Dunnick J K, den Dunnen J T, Langerød A, Wilkinson J M, Vihinen M, Cotton R G H, Hainaut P
Group of Molecular Carcinogenesis and Biomarkers, International Agency for Research on Cancer, World Health Organization, Lyon, France.
Hum Mutat. 2009 Mar;30(3):275-82. doi: 10.1002/humu.20832.
There are currently less than 40 locus-specific databases (LSDBs) and one large general database that curate data on somatic mutations in human cancer genes. These databases have different scope and use different annotation standards and database systems, resulting in duplicated efforts in data curation, and making it difficult for users to find clear and consistent information. As data related to somatic mutations are generated at an increasing pace it is urgent to create a framework for improving the collecting of this information and making it more accessible to clinicians, scientists, and epidemiologists to facilitate research on biomarkers. Here we propose a data flow for improving the connectivity between existing databases and we provide practical guidelines for data reporting, database contents, and annotation standards. These proposals are based on common standards recommended by the Human Genome Variation Society (HGVS) with additions related to specific requirements of somatic mutations in cancer. Indeed, somatic mutations may be used in molecular pathology and clinical studies to characterize tumor types, help treatment choice, predict response to treatment and patient outcome, or in epidemiological studies as markers for tumor etiology or exposure assessment. Thus, specific annotations are required to cover these diverse research topics. This initiative is meant to promote collaboration and discussion on these issues and the development of adequate resources that would avoid the loss of extremely valuable information generated by years of basic and clinical research.
目前,整理人类癌症基因体细胞突变数据的位点特异性数据库(LSDB)不足40个,大型综合数据库仅有1个。这些数据库范围不同,使用不同的注释标准和数据库系统,导致数据整理工作重复,用户难以找到清晰一致的信息。随着与体细胞突变相关的数据生成速度不断加快,迫切需要创建一个框架,以改进此类信息的收集,并使其更便于临床医生、科学家和流行病学家获取,从而促进生物标志物研究。在此,我们提出一种改进现有数据库之间连通性的数据流,并提供数据报告、数据库内容和注释标准的实用指南。这些提议基于人类基因组变异协会(HGVS)推荐的通用标准,并增加了与癌症体细胞突变特定要求相关的内容。事实上,体细胞突变可用于分子病理学和临床研究,以表征肿瘤类型、帮助选择治疗方案、预测治疗反应和患者预后,或在流行病学研究中作为肿瘤病因或暴露评估的标志物。因此,需要特定的注释来涵盖这些不同的研究主题。该倡议旨在促进就这些问题开展合作与讨论,并开发适当的资源,以避免多年基础和临床研究产生的极其宝贵的信息流失。