Suppr超能文献

整合海量 RNA-seq 数据以阐明黑腹果蝇转录组动态变化。

Integrating massive RNA-seq data to elucidate transcriptome dynamics in Drosophila melanogaster.

机构信息

Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, China.

Section of Developmental Genomics, National Institute of Diabetes and Kidney and Digestive Diseases, National Institutes of Health, Bethesda, MD 20892, USA.

出版信息

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad177.

Abstract

The volume of ribonucleic acid (RNA)-seq data has increased exponentially, providing numerous new insights into various biological processes. However, due to significant practical challenges, such as data heterogeneity, it is still difficult to ensure the quality of these data when integrated. Although some quality control methods have been developed, sample consistency is rarely considered and these methods are susceptible to artificial factors. Here, we developed MassiveQC, an unsupervised machine learning-based approach, to automatically download and filter large-scale high-throughput data. In addition to the read quality used in other tools, MassiveQC also uses the alignment and expression quality as model features. Meanwhile, it is user-friendly since the cutoff is generated from self-reporting and is applicable to multimodal data. To explore its value, we applied MassiveQC to Drosophila RNA-seq data and generated a comprehensive transcriptome atlas across 28 tissues from embryogenesis to adulthood. We systematically characterized fly gene expression dynamics and found that genes with high expression dynamics were likely to be evolutionarily young and expressed at late developmental stages, exhibiting high nonsynonymous substitution rates and low phenotypic severity, and they were involved in simple regulatory programs. We also discovered that human and Drosophila had strong positive correlations in gene expression in orthologous organs, revealing the great potential of the Drosophila system for studying human development and disease.

摘要

RNA-seq 数据的数量呈指数级增长,为各种生物学过程提供了许多新的见解。然而,由于存在数据异质性等重大实际挑战,在整合这些数据时仍然难以保证其质量。尽管已经开发出一些质量控制方法,但很少考虑样本一致性,并且这些方法容易受到人为因素的影响。在这里,我们开发了 MassiveQC,这是一种基于无监督机器学习的方法,可自动下载和过滤大规模高通量数据。除了其他工具中使用的读取质量外,MassiveQC 还将对齐和表达质量用作模型特征。同时,它用户友好,因为截止值是由自我报告生成的,适用于多模态数据。为了探索其价值,我们将 MassiveQC 应用于果蝇 RNA-seq 数据,并生成了从胚胎发生到成年的 28 种组织的综合转录组图谱。我们系统地描述了果蝇基因表达的动态变化,发现表达动态高的基因可能是进化上较年轻的基因,并且在发育后期表达,表现出高非同义替换率和低表型严重程度,它们参与了简单的调控程序。我们还发现,在同源器官中,人类和果蝇的基因表达具有很强的正相关性,这揭示了果蝇系统在研究人类发育和疾病方面的巨大潜力。

相似文献

4
Analysis of Drosophila melanogaster testis transcriptome.黑腹果蝇睾丸转录组分析。
BMC Genomics. 2018 Sep 24;19(1):697. doi: 10.1186/s12864-018-5085-z.
5
Bias and Correction in RNA-seq Data for Marine Species.海洋物种 RNA-seq 数据中的偏差与校正。
Mar Biotechnol (NY). 2017 Oct;19(5):541-550. doi: 10.1007/s10126-017-9773-5. Epub 2017 Sep 7.
8
Analysis of Single-Cell Transcriptome Data in Drosophila.果蝇单细胞转录组数据分析。
Methods Mol Biol. 2022;2540:93-111. doi: 10.1007/978-1-0716-2541-5_4.

本文引用的文献

4
'Fly-ing' from rare to common neurodegenerative disease mechanisms.从罕见到常见神经退行性疾病机制的“飞跃”。
Trends Genet. 2022 Sep;38(9):972-984. doi: 10.1016/j.tig.2022.03.018. Epub 2022 Apr 25.
5
FAIR data enabling new horizons for materials research.实现数据共享,为材料研究开拓新视野。
Nature. 2022 Apr;604(7907):635-642. doi: 10.1038/s41586-022-04501-x. Epub 2022 Apr 27.
6
Are batch effects still relevant in the age of big data?在大数据时代,批次效应是否仍然相关?
Trends Biotechnol. 2022 Sep;40(9):1029-1040. doi: 10.1016/j.tibtech.2022.02.005. Epub 2022 Mar 10.
7
FlyBase: a guided tour of highlighted features.FlyBase:特色功能导览
Genetics. 2022 Apr 4;220(4). doi: 10.1093/genetics/iyac035.
9
EIF1A depletion restrains human pituitary adenoma progression.真核起始因子1A缺失抑制人垂体腺瘤进展。
Transl Oncol. 2022 Jan;15(1):101299. doi: 10.1016/j.tranon.2021.101299. Epub 2021 Dec 1.
10
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.clusterProfiler 4.0:用于解释组学数据的通用富集工具。
Innovation (Camb). 2021 Jul 1;2(3):100141. doi: 10.1016/j.xinn.2021.100141. eCollection 2021 Aug 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验