Michalovova Monika, Kubat Zdenek, Hobza Roman, Vyskot Boris, Kejnovsky Eduard
Department of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Kralovopolska 135, CZ-61200, Brno, Czech Republic.
Current address: Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA.
BMC Bioinformatics. 2015 Mar 11;16(1):78. doi: 10.1186/s12859-015-0509-0.
Sex chromosomes present a genomic region which to some extent, differs between the genders of a single species. Reliable high-throughput methods for detection of sex chromosomes specific markers are needed, especially in species where genome information is limited. Next generation sequencing (NGS) opens the door for identification of unique sequences or searching for nucleotide polymorphisms between datasets. A combination of classical genetic segregation analysis along with RNA-Seq data can present an ideal tool to map and identify sex chromosome-specific expressed markers. To address this challenge, we established genetic cross of dioecious plant Rumex acetosa and generated RNA-Seq data from both parental generation and male and female offspring.
We present a pipeline for detection of sex linked genes based on nucleotide polymorphism analysis. In our approach, tracking of nucleotide polymorphisms is carried out using a cross of preferably distant populations. For this reason, only 4 datasets are needed - reads from high-throughput sequencing platforms for parent generation (mother and father) and F1 generation (male and female progeny). Our pipeline uses custom scripts together with external assembly, mapping and variant calling software. Given the resource-intensive nature of the computation, servers with high capacity are a requirement. Therefore, in order to keep this pipeline easily accessible and reproducible, we implemented it in Galaxy - an open, web-based platform for data-intensive biomedical research. Our tools are present in the Galaxy Tool Shed, from which they can be installed to any local Galaxy instance. As an output of the pipeline, user gets a FASTA file with candidate transcriptionally active sex-linked genes, sorted by their relevance. At the same time, a BAM file with identified genes and alignment of reads is also provided. Thus, polymorphisms following segregation pattern can be easily visualized, which significantly enhances primer design and subsequent steps of wet-lab verification.
Our pipeline presents a simple and freely accessible software tool for identification of sex chromosome linked genes in species without an existing reference genome. Based on combination of genetic crosses and RNA-Seq data, we have designed a high-throughput, cost-effective approach for a broad community of scientists focused on sex chromosome structure and evolution.
性染色体呈现出一个在某种程度上因单一物种的性别而异的基因组区域。需要可靠的高通量方法来检测性染色体特异性标记,尤其是在基因组信息有限的物种中。下一代测序(NGS)为识别独特序列或在数据集之间寻找核苷酸多态性打开了大门。经典遗传分离分析与RNA测序数据相结合,可以提供一个理想的工具来绘制和识别性染色体特异性表达标记。为应对这一挑战,我们建立了雌雄异株植物酸模的遗传杂交,并从亲代以及雄性和雌性后代中生成了RNA测序数据。
我们提出了一种基于核苷酸多态性分析来检测性连锁基因的流程。在我们的方法中,使用优选远缘群体的杂交来追踪核苷酸多态性。因此,只需要4个数据集——来自高通量测序平台的亲代(母本和父本)以及F1代(雄性和雌性后代)的读数。我们的流程使用自定义脚本以及外部组装、映射和变异调用软件。鉴于计算资源密集的性质,需要高容量服务器。因此,为了使这个流程易于访问和可重复,我们在Galaxy中实现了它——一个用于数据密集型生物医学研究的基于网络的开放平台。我们的工具存在于Galaxy工具库中,可以从那里安装到任何本地Galaxy实例。作为该流程的输出,用户会得到一个包含候选转录活性性连锁基因的FASTA文件,并按相关性排序。同时,还提供一个包含已识别基因和读数比对的BAM文件。因此,可以很容易地可视化遵循分离模式的多态性,这显著增强了引物设计以及湿实验室验证的后续步骤。
我们的流程为在没有现有参考基因组的物种中识别性染色体连锁基因提供了一个简单且免费可用的软件工具。基于遗传杂交和RNA测序数据的结合,我们为专注于性染色体结构和进化的广大科学家群体设计了一种高通量、经济高效的方法。