Department of Plant and Microbial Biology & Zurich-Basel Plant Science Center, University of Zurich, Zollikerstrasse 107, CH-8008, Zurich, Switzerland.
Centre for Organismal Studies, Heidelberg University, Im Neuenheimer Feld 230, 69120, Heidelberg, Germany.
Sci Rep. 2019 Feb 4;9(1):1320. doi: 10.1038/s41598-018-36768-4.
Genomic imprinting leads to different expression levels of maternally and paternally derived alleles. Over the last years, major progress has been made in identifying novel imprinted candidate genes in plants, owing to affordable next-generation sequencing technologies. However, reports on sequencing the transcriptome of hybrid F1 seed tissues strongly disagree about how many and which genes are imprinted. This raises questions about the relative impact of biological, environmental, technical, and analytic differences or biases. Here, we adopt a statistical approach, frequently used in RNA-seq data analysis, which properly models count overdispersion and considers replicate information of reciprocal crosses. We show that our statistical pipeline outperforms other methods in identifying imprinted genes in simulated and real data. Accordingly, reanalysis of genome-wide imprinting studies in Arabidopsis and maize shows that, at least for Arabidopsis, an increased agreement across datasets could be observed. For maize, however, consistent reanalysis did not yield a larger overlap between the datasets. This suggests that the discrepancy across publications might be partially due to different analysis pipelines but that technical, biological, and environmental factors underlie much of the discrepancy between datasets. Finally, we show that the set of genes that can be characterized regarding allelic bias by all studies with minimal confidence is small (~8,000/27,416 genes for Arabidopsis and ~12,000/39,469 for maize). In conclusion, we propose to use biologically replicated reciprocal crosses, high sequence coverage, and a generalized linear model approach to identify differentially expressed alleles in developing seeds.
基因组印迹导致来自母系和父系等位基因的不同表达水平。近年来,由于负担得起的下一代测序技术,在鉴定植物中新的印迹候选基因方面取得了重大进展。然而,关于杂交 F1 种子组织转录组测序的报告强烈不同意有多少和哪些基因是印迹的。这引发了关于生物、环境、技术和分析差异或偏差相对影响的问题。在这里,我们采用了一种统计方法,该方法常用于 RNA-seq 数据分析,能够正确地对计数过度分散进行建模,并考虑到相互交叉的重复信息。我们表明,我们的统计管道在识别模拟和真实数据中的印迹基因方面优于其他方法。相应地,对拟南芥和玉米中全基因组印迹研究的重新分析表明,至少对于拟南芥,可以观察到数据集之间的一致性增加。然而,对于玉米,一致的重新分析并没有在数据集之间产生更大的重叠。这表明,出版物之间的差异可能部分归因于不同的分析管道,但技术、生物和环境因素是数据集之间差异的主要原因。最后,我们表明,通过所有具有最小置信度的研究来描述等位基因偏倚的基因集很小(拟南芥为8000/27416 个基因,玉米为12000/39469 个基因)。总之,我们建议使用生物复制的相互交叉、高序列覆盖度和广义线性模型方法来识别发育种子中差异表达的等位基因。