Color Genomics, 831 Mitten Road, Suite 100, Burlingame, CA, USA.
Database (Oxford). 2019 Jan 1;2019:baz013. doi: 10.1093/database/baz013.
Next generation sequencing multi-gene panels have greatly improved the diagnostic yield and cost effectiveness of genetic testing and are rapidly being integrated into the clinic for hereditary cancer risk. With this technology comes a dramatic increase in the volume, type and complexity of data. This invaluable data though is too often buried or inaccessible to researchers, especially to those without strong analytical or programming skills. To effectively share comprehensive, integrated genotypic-phenotypic data, we built Color Data, a publicly available, cloud-based database that supports broad access and data literacy. The database is composed of 50 000 individuals who were sequenced for 30 genes associated with hereditary cancer risk and provides useful information on allele frequency and variant classification, as well as associated phenotypic information such as demographics and personal and family history. Our user-friendly interface allows researchers to easily execute their own queries with filtering, and the results of queries can be shared and/or downloaded. The rapid and broad dissemination of these research results will help increase the value of, and reduce the waste in, scientific resources and data. Furthermore, the database is able to quickly scale and support integration of additional genes and human hereditary conditions. We hope that this database will help researchers and scientists explore genotype-phenotype correlations in hereditary cancer, identify novel variants for functional analysis and enable data-driven drug discovery and development.
下一代测序多基因面板极大地提高了遗传检测的诊断效果和成本效益,并且正在迅速整合到遗传性癌症风险的临床实践中。随着这项技术的发展,数据的数量、类型和复杂性都有了显著的增加。然而,这些宝贵的数据往往被研究人员所忽视,尤其是那些没有强大分析或编程技能的研究人员。为了有效地共享全面、综合的基因型-表型数据,我们构建了 Color Data,这是一个公开的、基于云的数据库,支持广泛的访问和数据素养。该数据库由 50000 名个体组成,他们对与遗传性癌症风险相关的 30 个基因进行了测序,并提供了有用的等位基因频率和变异分类信息,以及相关的表型信息,如人口统计学信息以及个人和家族病史。我们用户友好的界面允许研究人员轻松地进行带有筛选的查询,并且可以共享和/或下载查询的结果。这些研究结果的快速和广泛传播将有助于提高科学资源和数据的价值,并减少浪费。此外,该数据库能够快速扩展并支持额外基因和人类遗传性疾病的整合。我们希望这个数据库能够帮助研究人员和科学家探索遗传性癌症中的基因型-表型相关性,识别新的变异体进行功能分析,并实现数据驱动的药物发现和开发。