College of Life Sciences, University of Chinese Academy of Sciences, Beijing100049, China.
BGI-Shenzhen, Shenzhen518083, Guangdong, China.
Nucleic Acids Res. 2023 Jan 6;51(D1):D890-D895. doi: 10.1093/nar/gkac638.
A high-quality genome variation database derived from a large-scale population is one of the most important infrastructures for genomics, clinical and translational medicine research. Here, we developed the Chinese Millionome Database (CMDB), a database that contains 9.04 million single nucleotide variants (SNV) with allele frequency information derived from low-coverage (0.06×-0.1×) whole-genome sequencing (WGS) data of 141 431 unrelated healthy Chinese individuals. These individuals were recruited from 31 out of the 34 administrative divisions in China, covering Han and 36 other ethnic minorities. CMDB, housing the WGS data of a multi-ethnic Chinese population featuring wide geographical distribution, has become the most representative and comprehensive Chinese population genome database to date. Researchers can quickly search for variant, gene or genomic regions to obtain the variant information, including mutation basic information, allele frequency, genic annotation and overview of frequencies in global populations. Furthermore, the CMDB also provides information on the association of the variants with a range of phenotypes, including height, BMI, maternal age and twin pregnancy. Based on these data, researchers can conduct meta-analysis of related phenotypes. CMDB is freely available at https://db.cngb.org/cmdb/.
一个高质量的基因组变异数据库源于大规模的人口是基因组学、临床和转化医学研究的最重要的基础设施之一。在这里,我们开发了中国百万基因组数据库(CMDB),这是一个包含 904 万个单核苷酸变异(SNV)的数据库,其等位基因频率信息来自 141431 名无亲缘关系的健康中国个体的低覆盖度(0.06×-0.1×)全基因组测序(WGS)数据。这些个体来自中国 34 个行政区域中的 31 个,涵盖汉族和其他 36 个少数民族。CMDB 是一个拥有广泛地理分布的多民族中国人群的 WGS 数据的存储库,它已成为迄今为止最具代表性和最全面的中国人群基因组数据库。研究人员可以快速搜索变体、基因或基因组区域,以获得变体信息,包括突变基本信息、等位基因频率、基因注释和全球人群中的频率概述。此外,CMDB 还提供了与一系列表型相关的变体信息,包括身高、BMI、母亲年龄和双胞胎妊娠。基于这些数据,研究人员可以对相关表型进行荟萃分析。CMDB 可在 https://db.cngb.org/cmdb/ 免费获得。