iHuman Institute, ShanghaiTech University, Shanghai, 201210, China.
School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
Sci China Life Sci. 2022 Dec;65(12):2539-2551. doi: 10.1007/s11427-021-2081-6. Epub 2022 Jun 10.
Olfactory receptors are poorly annotated for most genome-sequenced chordates. To address this deficiency, we developed a nhmmer-based olfactory receptor annotation tool Genome2OR ( https://github.com/ToHanwei/Genome2OR.git ), and used it to process 1,695 sequenced chordate genomes in the NCBI Assembly database as of January, 2021. In total, 765,248 olfactory receptor genes were annotated, with 404,426 functional genes and 360,822 pseudogenes, which represents a four-fold increase in the number of annotated olfactory receptors. Based on the annotation data, we built a database called Chordata Olfactory Receptor Database (CORD, https://cord.ihuman.shanghaitech.edu.cn ) for archiving, analysing and disseminating the data. Beyond the primary data, we offer derivative information, including pictures of species, cross references to public databases, structural models, sequence similarity networks and sequence profiles in the CORD. Furthermore, we did brief analyses on these receptors, including building a huge protein sequence similarity network covering all receptors in the database, and clustering them into 20 communities, classifying the 20 communities into three categories based on their presences/absences in ray-finned fish and/or lobe-finned fish. We infer that olfactory receptors should have unique activation and desensitization mechanisms by analysing their sequences and structural models. We believe the CORD can benefit the researchers and the general public who are interested in olfaction.
嗅觉受体在大多数基因组测序的脊索动物中注释较差。为了解决这一不足,我们开发了一种基于 nhmmer 的嗅觉受体注释工具 Genome2OR(https://github.com/ToHanwei/Genome2OR.git),并使用它处理截至 2021 年 1 月 NCBI 组装数据库中的 1695 个测序脊索动物基因组。总共注释了 765248 个嗅觉受体基因,其中 404426 个是功能基因,360822 个是假基因,这表示注释的嗅觉受体数量增加了四倍。基于注释数据,我们构建了一个名为 Chordata Olfactory Receptor Database(CORD,https://cord.ihuman.shanghaitech.edu.cn)的数据库,用于存档、分析和传播这些数据。除了原始数据,我们还提供衍生信息,包括物种图片、与公共数据库的交叉引用、结构模型、序列相似性网络和 CORD 中的序列特征。此外,我们对这些受体进行了简要分析,包括构建一个包含数据库中所有受体的庞大蛋白质序列相似性网络,并根据它们在射线鳍鱼和/或肺鱼中的存在/不存在将它们聚类为 20 个社区,根据它们在射线鳍鱼和/或肺鱼中的存在/不存在将这 20 个社区分为三类。通过分析它们的序列和结构模型,我们推断嗅觉受体应该具有独特的激活和脱敏机制。我们相信 CORD 可以使对嗅觉感兴趣的研究人员和公众受益。