Population Genetics Laboratory, Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Japan.
Center for Earth Surface System Dynamics, Atmosphere and Ocean Research Institute, University of Tokyo, Kashiwa, Japan.
Mol Biol Evol. 2021 Apr 13;38(4):1665-1676. doi: 10.1093/molbev/msaa296.
We developed dbCNS (http://yamasati.nig.ac.jp/dbcns), a new database for conserved noncoding sequences (CNSs). CNSs exist in many eukaryotes and are assumed to be involved in protein expression control. Version 1 of dbCNS, introduced here, includes a powerful and precise CNS identification pipeline for multiple vertebrate genomes. Mutations in CNSs may induce morphological changes and cause genetic diseases. For this reason, many vertebrate CNSs have been identified, with special reference to primate genomes. We integrated ∼6.9 million CNSs from many vertebrate genomes into dbCNS, which allows users to extract CNSs near genes of interest using keyword searches. In addition to CNSs, dbCNS contains published genome sequences of 161 species. With purposeful taxonomic sampling of genomes, users can employ CNSs as queries to reconstruct CNS alignments and phylogenetic trees, to evaluate CNS modifications, acquisitions, and losses, and to roughly identify species with CNSs having accelerated substitution rates. dbCNS also produces links to dbSNP for searching pathogenic single-nucleotide polymorphisms in human CNSs. Thus, dbCNS connects morphological changes with genetic diseases. A test analysis using 38 gnathostome genomes was accomplished within 30 s. dbCNS results can evaluate CNSs identified by other stand-alone programs using genome-scale data.
我们开发了 dbCNS(http://yamasati.nig.ac.jp/dbcns),这是一个用于保守非编码序列(CNSs)的新数据库。CNSs 存在于许多真核生物中,被认为参与蛋白质表达调控。本文介绍的 dbCNS 的版本 1 包括了一个用于多种脊椎动物基因组的强大而精确的 CNS 识别管道。CNS 中的突变可能会引起形态变化并导致遗传疾病。因此,已经鉴定了许多脊椎动物的 CNS,特别是灵长类动物的基因组。我们将来自许多脊椎动物基因组的约 690 万个 CNS 整合到 dbCNS 中,允许用户使用关键字搜索提取感兴趣基因附近的 CNS。除了 CNS 之外,dbCNS 还包含 161 个物种的已发布基因组序列。通过有目的的对基因组进行分类采样,用户可以将 CNS 用作查询,以重建 CNS 比对和系统发育树,评估 CNS 的修饰、获得和丢失,并大致识别具有 CNS 替换率加速的物种。dbCNS 还为搜索人类 CNS 中的致病性单核苷酸多态性生成到 dbSNP 的链接。因此,dbCNS 将形态变化与遗传疾病联系起来。使用 38 个颌口动物基因组进行的测试分析在 30 秒内完成。dbCNS 的结果可以使用基于基因组规模的数据评估其他独立程序识别的 CNS。