Department of Animal Science, Biotechnical Faculty, University of Ljubljana , Domžale, Slovenia .
OMICS. 2018 Jun;22(6):410-421. doi: 10.1089/omi.2018.0046. Epub 2018 May 10.
Harnessing the genomics big data requires innovation in how we extract and interpret biologically relevant variants. Currently, there is no established catalog of prioritized missense variants associated with deleterious protein function phenotypes. We report in this study, to the best of our knowledge, the first genome-wide prioritization of sequence variants with the most deleterious effect on protein function (potentially deleterious variants [pDelVars]) in nine vertebrate species: human, cattle, horse, sheep, pig, dog, rat, mouse, and zebrafish. The analysis was conducted using the Ensembl/BioMart tool. Genes comprising pDelVars in the highest number of examined species were identified using a Python script. Multiple genomic alignments of the selected genes were built to identify interspecies orthologous potentially deleterious variants, which we defined as the "ortho-pDelVars." Genome-wide prioritization revealed that in humans, 0.12% of the known variants are predicted to be deleterious. In seven out of nine examined vertebrate species, the genes encoding the multiple PDZ domain crumbs cell polarity complex component (MPDZ) and the transforming acidic coiled-coil containing protein 2 (TACC2) comprise pDelVars. Five interspecies ortho-pDelVars were identified in three genes. These findings offer new ways to harness genomics big data by facilitating the identification of functional polymorphisms in humans and animal models and thus provide a future basis for optimization of protocols for whole genome prioritization of pDelVars and screening of orthologous sequence variants. The approach presented here can inform various postgenomic applications such as personalized medicine and multiomics study of health interventions (iatromics).
利用基因组学大数据需要创新我们提取和解释与生物相关变体的方法。目前,还没有建立与有害蛋白质功能表型相关的优先错义变体目录。我们在本研究中报告,据我们所知,这是首次在九个脊椎动物物种(人类、牛、马、绵羊、猪、狗、大鼠、小鼠和斑马鱼)中对具有最有害蛋白质功能影响的序列变体(潜在有害变体 [pDelVars])进行全基因组优先级排序。使用 Ensembl/BioMart 工具进行了分析。使用 Python 脚本确定了包含 pDelVars 的基因数量最多的基因。构建了选定基因的多个基因组比对,以识别种间同源潜在有害变体,我们将其定义为“ortho-pDelVars”。全基因组优先级排序显示,在人类中,预计有 0.12%的已知变体是有害的。在九个被检查的脊椎动物物种中的七个中,编码多 PDZ 结构域crumbs 细胞极性复合物成分(MPDZ)和转化酸性卷曲螺旋蛋白 2(TACC2)的基因包含 pDelVars。在三个基因中鉴定了五个种间 ortho-pDelVars。这些发现通过促进人类和动物模型中功能多态性的鉴定,为优化全基因组 pDelVars 优先级排序和同源序列变体筛选的协议提供了未来的基础,从而为利用基因组学大数据提供了新方法。这里提出的方法可以为各种后基因组应用提供信息,如个性化医疗和健康干预(iatromics)的多组学研究。