使用RNA测序数据检测性连锁基因的全自动流程

Fully automated pipeline for detection of sex linked genes using RNA-Seq data.

作者信息

Michalovova Monika, Kubat Zdenek, Hobza Roman, Vyskot Boris, Kejnovsky Eduard

机构信息

Department of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Kralovopolska 135, CZ-61200, Brno, Czech Republic.

Current address: Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA.

出版信息

BMC Bioinformatics. 2015 Mar 11;16(1):78. doi: 10.1186/s12859-015-0509-0.

DOI:10.1186/s12859-015-0509-0

PMID:25884927

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4367819/

Abstract

BACKGROUND

Sex chromosomes present a genomic region which to some extent, differs between the genders of a single species. Reliable high-throughput methods for detection of sex chromosomes specific markers are needed, especially in species where genome information is limited. Next generation sequencing (NGS) opens the door for identification of unique sequences or searching for nucleotide polymorphisms between datasets. A combination of classical genetic segregation analysis along with RNA-Seq data can present an ideal tool to map and identify sex chromosome-specific expressed markers. To address this challenge, we established genetic cross of dioecious plant Rumex acetosa and generated RNA-Seq data from both parental generation and male and female offspring.

RESULTS

We present a pipeline for detection of sex linked genes based on nucleotide polymorphism analysis. In our approach, tracking of nucleotide polymorphisms is carried out using a cross of preferably distant populations. For this reason, only 4 datasets are needed - reads from high-throughput sequencing platforms for parent generation (mother and father) and F1 generation (male and female progeny). Our pipeline uses custom scripts together with external assembly, mapping and variant calling software. Given the resource-intensive nature of the computation, servers with high capacity are a requirement. Therefore, in order to keep this pipeline easily accessible and reproducible, we implemented it in Galaxy - an open, web-based platform for data-intensive biomedical research. Our tools are present in the Galaxy Tool Shed, from which they can be installed to any local Galaxy instance. As an output of the pipeline, user gets a FASTA file with candidate transcriptionally active sex-linked genes, sorted by their relevance. At the same time, a BAM file with identified genes and alignment of reads is also provided. Thus, polymorphisms following segregation pattern can be easily visualized, which significantly enhances primer design and subsequent steps of wet-lab verification.

CONCLUSIONS

Our pipeline presents a simple and freely accessible software tool for identification of sex chromosome linked genes in species without an existing reference genome. Based on combination of genetic crosses and RNA-Seq data, we have designed a high-throughput, cost-effective approach for a broad community of scientists focused on sex chromosome structure and evolution.

摘要

背景

性染色体呈现出一个在某种程度上因单一物种的性别而异的基因组区域。需要可靠的高通量方法来检测性染色体特异性标记，尤其是在基因组信息有限的物种中。下一代测序（NGS）为识别独特序列或在数据集之间寻找核苷酸多态性打开了大门。经典遗传分离分析与RNA测序数据相结合，可以提供一个理想的工具来绘制和识别性染色体特异性表达标记。为应对这一挑战，我们建立了雌雄异株植物酸模的遗传杂交，并从亲代以及雄性和雌性后代中生成了RNA测序数据。

结果

我们提出了一种基于核苷酸多态性分析来检测性连锁基因的流程。在我们的方法中，使用优选远缘群体的杂交来追踪核苷酸多态性。因此，只需要4个数据集——来自高通量测序平台的亲代（母本和父本）以及F1代（雄性和雌性后代）的读数。我们的流程使用自定义脚本以及外部组装、映射和变异调用软件。鉴于计算资源密集的性质，需要高容量服务器。因此，为了使这个流程易于访问和可重复，我们在Galaxy中实现了它——一个用于数据密集型生物医学研究的基于网络的开放平台。我们的工具存在于Galaxy工具库中，可以从那里安装到任何本地Galaxy实例。作为该流程的输出，用户会得到一个包含候选转录活性性连锁基因的FASTA文件，并按相关性排序。同时，还提供一个包含已识别基因和读数比对的BAM文件。因此，可以很容易地可视化遵循分离模式的多态性，这显著增强了引物设计以及湿实验室验证的后续步骤。

结论

我们的流程为在没有现有参考基因组的物种中识别性染色体连锁基因提供了一个简单且免费可用的软件工具。基于遗传杂交和RNA测序数据的结合，我们为专注于性染色体结构和进化的广大科学家群体设计了一种高通量、经济高效的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222d/4367819/f7b75fb27584/12859_2015_509_Fig1_HTML.jpg

相似文献

Fully automated pipeline for detection of sex linked genes using RNA-Seq data.

BMC Bioinformatics. 2015 Mar 11;16(1):78. doi: 10.1186/s12859-015-0509-0.

QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization.

BMC Genomics. 2016 Jan 8;17:39. doi: 10.1186/s12864-015-2356-9.

mInDel: a high-throughput and efficient pipeline for genome-wide InDel marker development.

BMC Genomics. 2016 Apr 14;17:290. doi: 10.1186/s12864-016-2614-5.

SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.

BMC Bioinformatics. 2016 Feb 4;17:66. doi: 10.1186/s12859-016-0923-y.

CANEapp: a user-friendly application for automated next generation transcriptomic data analysis.

BMC Genomics. 2016 Jan 13;17:49. doi: 10.1186/s12864-015-2346-y.

SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines.

BMC Bioinformatics. 2017 Sep 29;18(1):428. doi: 10.1186/s12859-017-1831-5.

Grape RNA-Seq analysis pipeline environment.

Bioinformatics. 2013 Mar 1;29(5):614-21. doi: 10.1093/bioinformatics/btt016. Epub 2013 Jan 17.

SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing.

PLoS One. 2017 Sep 8;12(9):e0184087. doi: 10.1371/journal.pone.0184087. eCollection 2017.

TRAPLINE: a standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation.

BMC Bioinformatics. 2016 Jan 6;17:21. doi: 10.1186/s12859-015-0873-9.

UNDR ROVER - a fast and accurate variant caller for targeted DNA sequencing.

BMC Bioinformatics. 2016 Apr 16;17:165. doi: 10.1186/s12859-016-1014-9.

引用本文的文献

Sex Chromosome Evolution: Hallmarks and Question Marks.

Mol Biol Evol. 2024 Nov 1;41(11). doi: 10.1093/molbev/msae218.

Sexy ways: approaches to studying plant sex chromosomes.

J Exp Bot. 2024 Sep 11;75(17):5204-5219. doi: 10.1093/jxb/erae173.

Dosage compensation evolution in plants: theories, controversies and mechanisms.

Philos Trans R Soc Lond B Biol Sci. 2022 May 9;377(1850):20210222. doi: 10.1098/rstb.2021.0222. Epub 2022 Mar 21.

Characterization of a Sex-Determining Region and Its Genomic Context via Statistical Estimates of Haplotype Frequencies in Daughters and Sons Sequenced in Pools.

Genome Biol Evol. 2021 Aug 3;13(8). doi: 10.1093/gbe/evab121.

Fundamentally different repetitive element composition of sex chromosomes in Rumex acetosa.

Ann Bot. 2021 Jan 1;127(1):33-47. doi: 10.1093/aob/mcaa160.

Evidence for Dosage Compensation in , a Plant with a Highly Heteromorphic XY System.

Genes (Basel). 2020 Jul 13;11(7):787. doi: 10.3390/genes11070787.

Evolution of sex determination and heterogamety changes in section Otites of the genus Silene.

Sci Rep. 2019 Jan 31;9(1):1045. doi: 10.1038/s41598-018-37412-x.

Impact of Repetitive Elements on the Y Chromosome Formation in Plants.

Genes (Basel). 2017 Nov 1;8(11):302. doi: 10.3390/genes8110302.

The Evolution of Sex Chromosomes and Dosage Compensation in Plants.

Genome Biol Evol. 2017 Mar 1;9(3):627-645. doi: 10.1093/gbe/evw282.

SEX-DETector: A Probabilistic Approach to Study Sex Chromosomes in Non-Model Organisms.

Genome Biol Evol. 2016 Aug 29;8(8):2530-43. doi: 10.1093/gbe/evw172.

本文引用的文献

Genetic degeneration of old and young Y chromosomes in the flowering plant Rumex hastatulus.

Proc Natl Acad Sci U S A. 2014 May 27;111(21):7713-8. doi: 10.1073/pnas.1319227111. Epub 2014 May 13.

Contrasting patterns of transposable element and satellite distribution on sex chromosomes (XY1Y2) in the dioecious plant Rumex acetosa.

Genome Biol Evol. 2013;5(4):769-82. doi: 10.1093/gbe/evt049.

A comparison of methods for differential expression analysis of RNA-seq data.

BMC Bioinformatics. 2013 Mar 9;14:91. doi: 10.1186/1471-2105-14-91.

Rapid de novo evolution of X chromosome dosage compensation in Silene latifolia, a plant with young sex chromosomes.

PLoS Biol. 2012;10(4):e1001308. doi: 10.1371/journal.pbio.1001308. Epub 2012 Apr 17.

Plant Y chromosome degeneration is retarded by haploid purifying selection.

Curr Biol. 2011 Sep 13;21(17):1475-9. doi: 10.1016/j.cub.2011.07.045.

Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Nat Biotechnol. 2011 May 15;29(7):644-52. doi: 10.1038/nbt.1883.

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Genome Biol. 2010;11(8):R86. doi: 10.1186/gb-2010-11-8-r86. Epub 2010 Aug 25.

Galaxy: a web-based genome analysis tool for experimentalists.

Curr Protoc Mol Biol. 2010 Jan;Chapter 19:Unit 19.10.1-21. doi: 10.1002/0471142727.mb1910s89.

The Sequence Alignment/Map format and SAMtools.

Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.

Fast and accurate short read alignment with Burrows-Wheeler transform.

Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用RNA测序数据检测性连锁基因的全自动流程

Fully automated pipeline for detection of sex linked genes using RNA-Seq data.

作者信息

Michalovova Monika, Kubat Zdenek, Hobza Roman, Vyskot Boris, Kejnovsky Eduard

机构信息

Department of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Kralovopolska 135, CZ-61200, Brno, Czech Republic.

Current address: Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA.