Suppr超能文献

同源基因表达和共表达网络分析及异源多倍体的进化推断。

Homoeologous gene expression and co-expression network analyses and evolutionary inference in allopolyploids.

机构信息

Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA.

出版信息

Brief Bioinform. 2021 Mar 22;22(2):1819-1835. doi: 10.1093/bib/bbaa035.

Abstract

Polyploidy is a widespread phenomenon throughout eukaryotes. Due to the coexistence of duplicated genomes, polyploids offer unique challenges for estimating gene expression levels, which is essential for understanding the massive and various forms of transcriptomic responses accompanying polyploidy. Although previous studies have explored the bioinformatics of polyploid transcriptomic profiling, the causes and consequences of inaccurate quantification of transcripts from duplicated gene copies have not been addressed. Using transcriptomic data from the cotton genus (Gossypium) as an example, we present an analytical workflow to evaluate a variety of bioinformatic method choices at different stages of RNA-seq analysis, from homoeolog expression quantification to downstream analysis used to infer key phenomena of polyploid expression evolution. In general, EAGLE-RC and GSNAP-PolyCat outperform other quantification pipelines tested, and their derived expression dataset best represents the expected homoeolog expression and co-expression divergence. The performance of co-expression network analysis was less affected by homoeolog quantification than by network construction methods, where weighted networks outperformed binary networks. By examining the extent and consequences of homoeolog read ambiguity, we illuminate the potential artifacts that may affect our understanding of duplicate gene expression, including an overestimation of homoeolog co-regulation and the incorrect inference of subgenome asymmetry in network topology. Taken together, our work points to a set of reasonable practices that we hope are broadly applicable to the evolutionary exploration of polyploids.

摘要

多倍体是真核生物中广泛存在的现象。由于重复基因组的共存,多倍体为估计基因表达水平带来了独特的挑战,而这对于理解伴随多倍体产生的大规模和多样化的转录组响应至关重要。尽管先前的研究已经探索了多倍体转录组分析的生物信息学,但对于重复基因拷贝转录本的不准确定量的原因和后果尚未得到解决。我们使用棉花属(Gossypium)的转录组数据作为示例,提出了一种分析工作流程,用于评估 RNA-seq 分析的不同阶段(从同系物表达定量到用于推断多倍体表达进化关键现象的下游分析)的各种生物信息学方法选择。总的来说,EAGLE-RC 和 GSNAP-PolyCat 优于测试的其他定量管道,并且它们衍生的表达数据集最能代表预期的同系物表达和共表达分歧。共表达网络分析的性能受同系物定量的影响小于网络构建方法的影响,其中加权网络优于二值网络。通过检查同系物读取模糊的程度和后果,我们阐明了可能影响我们对重复基因表达理解的潜在人为因素,包括同系物共调控的高估和网络拓扑中亚基因组不对称的错误推断。总之,我们的工作指出了一组合理的实践,我们希望这些实践广泛适用于多倍体的进化探索。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/769a/7986634/9063607415fe/bbaa035f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验