Babbi Giulia, Savojardo Castrense, Baldazzi Davide, Martelli Pier Luigi, Casadio Rita
Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
Centro di Riferimento Oncologico (CRO), Aviana, Italy.
Front Mol Biosci. 2022 Sep 16;9:966927. doi: 10.3389/fmolb.2022.966927. eCollection 2022.
Grouping residue variations in a protein according to their physicochemical properties allows a dimensionality reduction of all the possible substitutions in a variant with respect to the wild type. Here, by using a large dataset of proteins with disease-related and benign variations, as derived by merging Humsavar and ClinVar data, we investigate to which extent our physicochemical grouping procedure can help in determining whether patterns of variation types are related to specific groups of diseases and whether they occur in Pfam and/or InterPro gene domains. Here, we download 75,145 germline disease-related and benign variations of 3,605 genes, group them according to physicochemical categories and map them into Pfam and InterPro gene domains. Statistically validated analysis indicates that each cluster of genes associated to Mondo anatomical system categorizations is characterized by a specific variation pattern. Patterns identify specific Pfam and InterPro domain-Mondo category associations. Our data suggest that the association of variation patterns to Mondo categories is unique and may help in associating gene variants to genetic diseases. This work corroborates in a much larger data set previous observations from our group.
根据蛋白质中残基变异的物理化学性质进行分组,可以降低变异体相对于野生型所有可能替代的维度。在此,通过使用合并Humsavar和ClinVar数据得出的包含疾病相关和良性变异的大量蛋白质数据集,我们研究了我们的物理化学分组程序在多大程度上有助于确定变异类型模式是否与特定疾病组相关,以及它们是否出现在Pfam和/或InterPro基因结构域中。在此,我们下载了3605个基因的75145个种系疾病相关和良性变异,根据物理化学类别对它们进行分组,并将它们映射到Pfam和InterPro基因结构域中。经统计验证的分析表明,与Mondo解剖系统分类相关的每一组基因都具有特定的变异模式。这些模式识别出特定的Pfam和InterPro结构域 - Mondo类别关联。我们的数据表明,变异模式与Mondo类别的关联是独特的,可能有助于将基因变异与遗传疾病联系起来。这项工作在一个大得多的数据集中证实了我们小组之前的观察结果。