Muyle Aline, Käfer Jos, Zemp Niklaus, Mousset Sylvain, Picard Franck, Marais Gabriel Ab
Laboratoire de Biométrie et Biologie Evolutive (UMR 5558), CNRS/Université Lyon 1, Villeurbanne, France
Laboratoire de Biométrie et Biologie Evolutive (UMR 5558), CNRS/Université Lyon 1, Villeurbanne, France.
Genome Biol Evol. 2016 Aug 29;8(8):2530-43. doi: 10.1093/gbe/evw172.
We propose a probabilistic framework to infer autosomal and sex-linked genes from RNA-seq data of a cross for any sex chromosome type (XY, ZW, and UV). Sex chromosomes (especially the non-recombining and repeat-dense Y, W, U, and V) are notoriously difficult to sequence. Strategies have been developed to obtain partially assembled sex chromosome sequences. Most of them remain difficult to apply to numerous non-model organisms, either because they require a reference genome, or because they are designed for evolutionarily old systems. Sequencing a cross (parents and progeny) by RNA-seq to study the segregation of alleles and infer sex-linked genes is a cost-efficient strategy, which also provides expression level estimates. However, the lack of a proper statistical framework has limited a broader application of this approach. Tests on empirical Silene data show that our method identifies 20-35% more sex-linked genes than existing pipelines, while making reliable inferences for downstream analyses. Approximately 12 individuals are needed for optimal results based on simulations. For species with an unknown sex-determination system, the method can assess the presence and type (XY vs. ZW) of sex chromosomes through a model comparison strategy. The method is particularly well optimized for sex chromosomes of young or intermediate age, which are expected in thousands of yet unstudied lineages. Any organisms, including non-model ones for which nothing is known a priori, that can be bred in the lab, are suitable for our method. SEX-DETector and its implementation in a Galaxy workflow are made freely available.
我们提出了一个概率框架,用于从任何性染色体类型(XY、ZW和UV)的杂交RNA测序数据中推断常染色体和性连锁基因。性染色体(尤其是非重组且富含重复序列的Y、W、U和V染色体)的测序难度极大。人们已经开发出多种策略来获得部分组装的性染色体序列。但其中大多数方法仍难以应用于众多非模式生物,要么是因为它们需要参考基因组,要么是因为它们是为进化古老的系统设计的。通过RNA测序对杂交群体(亲本和子代)进行测序,以研究等位基因的分离并推断性连锁基因,是一种经济高效的策略,同时还能提供基因表达水平的估计。然而,缺乏合适的统计框架限制了这种方法的更广泛应用。对经验性的麦瓶草属数据进行的测试表明,我们的方法比现有流程识别出的性连锁基因多20%-35%,同时还能为下游分析做出可靠推断。基于模拟结果,大约需要12个个体才能获得最佳结果。对于性别决定系统未知的物种,该方法可以通过模型比较策略来评估性染色体的存在和类型(XY型还是ZW型)。该方法特别针对年轻或中等年龄的性染色体进行了优化,预计在数千个尚未研究的谱系中都存在这类性染色体。任何能够在实验室中进行杂交的生物,包括事先一无所知的非模式生物,都适用于我们的方法。我们免费提供了SEX-DETector及其在Galaxy工作流程中的实现。