LSTrAP：将RNA测序数据高效整合到共表达网络中。

LSTrAP: efficiently combining RNA sequencing data into co-expression networks.

作者信息

Proost Sebastian, Krawczyk Agnieszka, Mutwil Marek

机构信息

Max-Planck Institute for Molecular Plant Physiology, Am Muehlenberg 1, 14476, Potsdam, Germany.

出版信息

BMC Bioinformatics. 2017 Oct 10;18(1):444. doi: 10.1186/s12859-017-1861-z.

DOI:10.1186/s12859-017-1861-z

PMID:29017446

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5634843/

Abstract

BACKGROUND

Since experimental elucidation of gene function is often laborious, various in silico methods have been developed to predict gene function of uncharacterized genes. Since functionally related genes are often expressed in the same tissues, conditions and developmental stages (co-expressed), functional annotation of characterized genes can be transferred to co-expressed genes lacking annotation. With genome-wide expression data available, the construction of co-expression networks, where genes are nodes and edges connect significantly co-expressed genes, provides unprecedented opportunities to predict gene function. However, the construction of such networks requires large volumes of high-quality data, multiple processing steps and a considerable amount of computation power. While efficient tools exist to process RNA-Seq data, pipelines which combine them to construct co-expression networks efficiently are currently lacking.

RESULTS

LSTrAP (Large-Scale Transcriptome Analysis Pipeline), presented here, combines all essential tools to construct co-expression networks based on RNA-Seq data into a single, efficient workflow. By supporting parallel computing on computer cluster infrastructure, processing hundreds of samples becomes feasible as shown here for Arabidopsis thaliana and Sorghum bicolor, which comprised 876 and 215 samples respectively. The former was used here to show how the quality control, included in LSTrAP, can detect spurious or low-quality samples. The latter was used to show how co-expression networks are able to group known photosynthesis genes and imply a role in this process of several, currently uncharacterized, genes.

CONCLUSIONS

LSTrAP combines the most popular and performant methods to construct co-expression networks from RNA-Seq data into a single workflow. This allows large amounts of expression data, required to construct co-expression networks, to be processed efficiently and consistently across hundreds of samples. LSTrAP is implemented in Python 3.4 (or higher) and available under MIT license from https://github.molgen.mpg.de/proost/LSTrAP.

摘要

背景

由于通过实验阐明基因功能往往很费力，因此已开发出各种计算机方法来预测未表征基因的功能。由于功能相关的基因通常在相同的组织、条件和发育阶段表达（共表达），因此已表征基因的功能注释可以转移到缺乏注释的共表达基因上。随着全基因组表达数据的可得，共表达网络的构建（其中基因是节点，边连接显著共表达的基因）为预测基因功能提供了前所未有的机会。然而，构建这样的网络需要大量高质量的数据、多个处理步骤和相当大的计算能力。虽然存在处理RNA-Seq数据的有效工具，但目前缺乏将它们组合起来有效构建共表达网络的流程。

结果

本文介绍的LSTrAP（大规模转录组分析流程）将基于RNA-Seq数据构建共表达网络的所有基本工具组合成一个高效的工作流程。通过支持在计算机集群基础设施上进行并行计算，处理数百个样本变得可行，如本文针对拟南芥和双色高粱所示，它们分别包含876个和215个样本。前者用于展示LSTrAP中包含的质量控制如何检测虚假或低质量样本。后者用于展示共表达网络如何能够将已知的光合作用基因分组，并暗示几个目前未表征的基因在此过程中的作用。

结论

LSTrAP将从RNA-Seq数据构建共表达网络的最流行且性能最佳的方法组合成一个工作流程。这使得构建共表达网络所需的大量表达数据能够在数百个样本中高效且一致地进行处理。LSTrAP用Python 3.4（或更高版本）实现，可在https://github.molgen.mpg.de/proost/LSTrAP上根据MIT许可获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80ca/5634843/cafc6172abf0/12859_2017_1861_Fig1_HTML.jpg

相似文献

LSTrAP: efficiently combining RNA sequencing data into co-expression networks.LSTrAP：将RNA测序数据高效整合到共表达网络中。

BMC Bioinformatics. 2017 Oct 10;18(1):444. doi: 10.1186/s12859-017-1861-z.

LSTrAP-Cloud: A User-Friendly Cloud Computing Pipeline to Infer Coexpression Networks.LSTrAP-Cloud：一个用户友好的云计算管道，用于推断共表达网络。

Genes (Basel). 2020 Apr 16;11(4):428. doi: 10.3390/genes11040428.

LSTrAP-denovo: Automated Generation of Transcriptome Atlases for Eukaryotic Species Without Genomes.LSTrAP-denovo：用于无基因组真核生物转录组图谱自动生成的方法。

Physiol Plant. 2024 Jul-Aug;176(4):e14407. doi: 10.1111/ppl.14407.

LSTrAP-Kingdom: an automated pipeline to generate annotated gene expression atlases for kingdoms of life.LSTrAP-生物界：一个用于为生物界生成注释基因表达图谱的自动化流程。

Bioinformatics. 2021 Sep 29;37(18):3053-3055. doi: 10.1093/bioinformatics/btab168.

LSTrAP-Crowd: prediction of novel components of bacterial ribosomes with crowd-sourced analysis of RNA sequencing data.LSTrAP-Crowd：通过对 RNA 测序数据的众包分析预测细菌核糖体的新成分。

BMC Biol. 2020 Sep 3;18(1):114. doi: 10.1186/s12915-020-00846-9.

Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks.对全基因组相关性测量进行排名可改进基于微阵列和 RNA-seq 的全局和靶向共表达网络。

Sci Rep. 2018 Jul 18;8(1):10885. doi: 10.1038/s41598-018-29077-3.

CATchUP: A Web Database for Spatiotemporally Regulated Genes.CATchUP：一个用于时空调控基因的网络数据库。

Plant Cell Physiol. 2017 Jan 1;58(1):e3. doi: 10.1093/pcp/pcw199.

MOROKOSHI: transcriptome database in Sorghum bicolor.森罗木：双色高粱转录组数据库。

Plant Cell Physiol. 2015 Jan;56(1):e6. doi: 10.1093/pcp/pcu187. Epub 2014 Dec 9.

SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.SPARTA：用于基于参考的细菌RNA测序转录组自动分析的简单程序。

BMC Bioinformatics. 2016 Feb 4;17:66. doi: 10.1186/s12859-016-0923-y.

From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data.从千兆字节到千字节：挖掘大型RNA测序转录组学数据的生物信息学协议

PLoS One. 2015 Apr 22;10(4):e0125000. doi: 10.1371/journal.pone.0125000. eCollection 2015.

引用本文的文献

ACT2.6: Global Gene Coexpression Network in Using WGCNA.ACT2.6：使用加权基因共表达网络分析（WGCNA）构建的全球基因共表达网络

Genes (Basel). 2025 Feb 23;16(3):258. doi: 10.3390/genes16030258.

Improving gene regulatory network inference and assessment: The importance of using network structure.改进基因调控网络推断与评估：利用网络结构的重要性。

Front Genet. 2023 Feb 27;14:1143382. doi: 10.3389/fgene.2023.1143382. eCollection 2023.

Does zaxinone counteract strigolactones in shaping rice architecture?扎西酮是否能抵消独脚金内酯在塑造水稻结构中的作用？

Plant Signal Behav. 2023 Dec 31;18(1):2184127. doi: 10.1080/15592324.2023.2184127.

The Saururus chinensis genome provides insights into the evolution of pollination strategies and herbaceousness in magnoliids.《中华蛇根草基因组揭示木兰类植物传粉策略和草本习性的演化》

Plant J. 2023 Mar;113(5):1021-1034. doi: 10.1111/tpj.16097. Epub 2023 Feb 9.

Unleashing the power within short-read RNA-seq for plant research: Beyond differential expression analysis and toward regulomics.释放短读长RNA测序在植物研究中的潜能：超越差异表达分析，迈向调控组学。

Front Plant Sci. 2022 Dec 8;13:1038109. doi: 10.3389/fpls.2022.1038109. eCollection 2022.

Gene Co-Expression Network Tools and Databases for Crop Improvement.用于作物改良的基因共表达网络工具和数据库

Plants (Basel). 2022 Jun 21;11(13):1625. doi: 10.3390/plants11131625.

Preparation and Curation of Omics Data for Genome-Wide Association Studies.组学数据的准备和管理用于全基因组关联研究。

Methods Mol Biol. 2022;2481:127-150. doi: 10.1007/978-1-0716-2237-7_8.

A comparative transcriptomics and eQTL approach identifies SlWD40 as a tomato fruit ripening regulator.比较转录组学和 eQTL 方法鉴定 SlWD40 为番茄果实成熟调控因子。

Plant Physiol. 2022 Aug 29;190(1):250-266. doi: 10.1093/plphys/kiac200.

The gene co-expression network.基因共表达网络。

Plant Direct. 2022 Apr 26;6(4):e396. doi: 10.1002/pld3.396. eCollection 2022 Apr.

Multi-omics approaches explain the growth-promoting effect of the apocarotenoid growth regulator zaxinone in rice.多组学方法解释了类胡萝卜素生长调节剂玉米赤烯酮在水稻中的促生长作用。

Commun Biol. 2021 Oct 25;4(1):1222. doi: 10.1038/s42003-021-02740-8.

本文引用的文献

Phylogenomic analysis of gene co-expression networks reveals the evolution of functional modules.基于基因共表达网络的系统发生分析揭示了功能模块的进化。

Plant J. 2017 May;90(3):447-465. doi: 10.1111/tpj.13502. Epub 2017 Mar 23.

OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy.OrthoFinder：解决全基因组比较中的基本偏差可显著提高直系同源组推断准确性。

Genome Biol. 2015 Aug 6;16(1):157. doi: 10.1186/s13059-015-0721-2.

HISAT: a fast spliced aligner with low memory requirements.HISAT：一种内存需求低的快速剪接比对器。

Nat Methods. 2015 Apr;12(4):357-60. doi: 10.1038/nmeth.3317. Epub 2015 Mar 9.

Guidance for RNA-seq co-expression network construction and analysis: safety in numbers.RNA测序共表达网络构建与分析指南：数量带来的安全性

Bioinformatics. 2015 Jul 1;31(13):2123-30. doi: 10.1093/bioinformatics/btv118. Epub 2015 Feb 24.

Convergent losses of decay mechanisms and rapid turnover of symbiosis genes in mycorrhizal mutualists.共生真菌互惠共生体中降解机制趋同丧失和共生基因快速周转。

Nat Genet. 2015 Apr;47(4):410-5. doi: 10.1038/ng.3223. Epub 2015 Feb 23.

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.使用DESeq2对RNA测序数据的倍数变化和离散度进行适度估计。

Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.

AraNet v2: an improved database of co-functional gene networks for the study of Arabidopsis thaliana and 27 other nonmodel plant species.AraNet v2：一个经过改进的共功能基因网络数据库，用于研究拟南芥和其他27种非模式植物物种。

Nucleic Acids Res. 2015 Jan;43(Database issue):D996-1002. doi: 10.1093/nar/gku1053. Epub 2014 Oct 29.

HTSeq--a Python framework to work with high-throughput sequencing data.HTSeq——一个用于处理高通量测序数据的Python框架。

Bioinformatics. 2015 Jan 15;31(2):166-9. doi: 10.1093/bioinformatics/btu638. Epub 2014 Sep 25.

Elucidating gene function and function evolution through comparison of co-expression networks of plants.通过比较植物的共表达网络阐明基因功能和功能进化。

Front Plant Sci. 2014 Aug 19;5:394. doi: 10.3389/fpls.2014.00394. eCollection 2014.

Trimmomatic: a flexible trimmer for Illumina sequence data.Trimmomatic：一款适用于 Illumina 测序数据的灵活修剪工具。

Bioinformatics. 2014 Aug 1;30(15):2114-20. doi: 10.1093/bioinformatics/btu170. Epub 2014 Apr 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

LSTrAP：将RNA测序数据高效整合到共表达网络中。

LSTrAP: efficiently combining RNA sequencing data into co-expression networks.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献