Munz Matthias, Khodaygani Mohammad, Aherrahrou Zouhair, Busch Hauke, Wohlers Inken
Medical Systems Biology Division, Lübeck Institute of Experimental Dermatology and Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany.
Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany.
PeerJ. 2021 Mar 11;9:e11017. doi: 10.7717/peerj.11017. eCollection 2021.
Mice are the most widely used animal model to study genotype to phenotype relationships. Inbred mice are genetically identical, which eliminates genetic heterogeneity and makes them particularly useful for genetic studies. Many different strains have been bred over decades and a vast amount of phenotypic data has been generated. In addition, recently whole genome sequencing-based genome-wide genotype data for many widely used inbred strains has been released. Here, we present an approach for in silico fine-mapping that uses genotypic data of 37 inbred mouse strains together with phenotypic data provided by the user to propose candidate variants and genes for the phenotype under study. Public genome-wide genotype data covering more than 74 million variant sites is queried efficiently in real-time to provide those variants that are compatible with the observed phenotype differences between strains. Variants can be filtered by molecular consequences and by corresponding molecular impact. Candidate gene lists can be generated from variant lists on the fly. Fine-mapping together with annotation or filtering of results is provided in a Bioconductor package called MouseFM. In order to characterize candidate variant lists under various settings, MouseFM was applied to two expression data sets across 20 inbred mouse strains, one from neutrophils and one from CD4 T cells. Fine-mapping was assessed for about 10,000 genes, respectively, and identified candidate variants and haplotypes for many expression quantitative trait loci (eQTLs) reported previously based on these data. For albinism, MouseFM reports only one variant allele of moderate or high molecular impact that only albino mice share: a missense variant in the gene, reported previously to be causal for this phenotype. Performing in silico fine-mapping for interfrontal bone formation in mice using four strains with and five strains without interfrontal bone results in 12 genes. Of these, three are related to skull shaping abnormality. Finally performing fine-mapping for dystrophic cardiac calcification by comparing 9 strains showing the phenotype with eight strains lacking it, we identify only one moderate impact variant in the known causal gene . In summary, this illustrates the benefit of using MouseFM for candidate variant and gene identification.
小鼠是研究基因型与表型关系时使用最广泛的动物模型。近交系小鼠基因相同,消除了遗传异质性,使其在遗传研究中特别有用。几十年来培育出了许多不同的品系,并产生了大量的表型数据。此外,最近还发布了许多广泛使用的近交系基于全基因组测序的全基因组基因型数据。在这里,我们提出了一种计算机辅助精细定位方法,该方法使用37个近交系小鼠品系的基因型数据以及用户提供的表型数据,为所研究的表型提出候选变异和基因。可实时高效查询覆盖超过7400万个变异位点的公共全基因组基因型数据,以提供与品系间观察到的表型差异相匹配的变异。变异可通过分子后果和相应的分子影响进行筛选。候选基因列表可即时从变异列表中生成。在一个名为MouseFM的Bioconductor软件包中提供了精细定位以及结果的注释或筛选功能。为了在各种设置下对候选变异列表进行表征,将MouseFM应用于来自20个近交系小鼠品系的两个表达数据集,一个来自中性粒细胞,另一个来自CD4 T细胞。分别对约10000个基因进行了精细定位,并根据这些数据确定了许多先前报道的表达数量性状位点(eQTL)的候选变异和单倍型。对于白化病,MouseFM仅报告了一种具有中度或高度分子影响的变异等位基因,只有白化病小鼠具有该等位基因:该基因中的一个错义变异,先前报道该变异是导致这种表型的原因。使用四个有额间骨和五个无额间骨的品系对小鼠额间骨形成进行计算机辅助精细定位,得到12个基因。其中三个与颅骨形状异常有关。最后,通过比较9个表现出该表型的品系和8个未表现出该表型的品系,对营养不良性心脏钙化进行精细定位,我们在已知的致病基因中仅发现一个具有中度影响的变异。总之,这说明了使用MouseFM进行候选变异和基因鉴定的好处。