Unit of Medical Genetics, Department of Medical Sciences, University of Ferrara, Ferrara, Italy.
Dubowitz Neuromuscular Unit, Institute of Child Health, University College London, London, United Kingdom.
PLoS One. 2022 Mar 31;17(3):e0265469. doi: 10.1371/journal.pone.0265469. eCollection 2022.
We designed a novel strategy to define codon usage bias (CUB) in 6 specific small cohorts of human genes. We calculated codon usage (CU) values in 29 non-disease-causing (NDC) and 31 disease-causing (DC) human genes which are highly expressed in 3 distinct tissues, kidney, muscle, and skin. We applied our strategy to the same selected genes annotated in 15 mammalian species. We obtained CUB hierarchical clusters for each gene cohort which showed tissue-specific and disease-specific CUB fingerprints. We showed that DC genes (especially those expressed in muscle) display a low CUB, well recognizable in codon hierarchical clustering. We defined the extremely biased codons as "zero codons" and found that their number is significantly higher in all DC genes, all tissues, and that this trend is conserved across mammals. Based on this calculation in different gene cohorts, we identified 5 codons which are more differentially used across genes and mammals, underlining that some genes have favorite synonymous codons in use. Since of the muscle genes clear clusters, and, among these, dystrophin gene surprisingly does not show any "zero codon" we adopted a novel approach to study CUB, we called "mapping-on-codons". We positioned 2828 dystrophin missense and nonsense pathogenic variations on their respective codon, highlighting that its frequency and occurrence is not dependent on the CU values. We conclude our strategy consents to identify a hierarchical clustering of CU values in a gene cohort-specific fingerprints, with recognizable trend across mammals. In DC muscle genes also a disease-related fingerprint can be observed, allowing discrimination between DC and NDC genes. We propose that using our strategy which studies CU in specific gene cohorts, as rare disease genes, and tissue specific genes, may provide novel information about the CUB role in human and medical genetics, with implications on synonymous variations interpretation and codon optimization algorithms.
我们设计了一种新策略来定义 6 个特定人类基因小群组的密码子使用偏性 (CUB)。我们在 29 个非致病 (NDC) 和 31 个致病 (DC) 人类基因中计算了密码子使用 (CU) 值,这些基因在 3 种不同组织(肾脏、肌肉和皮肤)中高度表达。我们将我们的策略应用于在 15 种哺乳动物物种中注释的相同选定基因。我们为每个基因群组获得了 CUB 层次聚类,这些聚类显示了组织特异性和疾病特异性的 CUB 指纹。我们表明,DC 基因(特别是在肌肉中表达的那些)显示出低 CUB,在密码子层次聚类中很容易识别。我们将极度偏倚的密码子定义为“零密码子”,并发现所有 DC 基因、所有组织中的数量都显著更高,并且这种趋势在哺乳动物中是保守的。基于不同基因群组中的这种计算,我们确定了 5 个在基因和哺乳动物中使用差异更大的密码子,这表明一些基因具有首选的同义密码子。由于肌肉基因聚类清晰,而且其中肌营养不良基因出人意料地没有显示任何“零密码子”,因此我们采用了一种新方法来研究 CUB,我们称之为“基于密码子的映射”。我们将 2828 个肌营养不良症错义和无义致病性变异定位到各自的密码子上,突出表明其频率和出现与 CU 值无关。我们得出结论,我们的策略允许在特定基因群组中识别 CU 值的层次聚类,在哺乳动物中具有可识别的趋势。在 DC 肌肉基因中,也可以观察到与疾病相关的指纹,从而可以区分 DC 和 NDC 基因。我们提出,使用我们的策略研究特定基因群组中的 CU,作为罕见疾病基因和组织特异性基因,可能会为人类和医学遗传学中的 CUB 作用提供新的信息,这对同义变异解释和密码子优化算法具有启示意义。