基于独立成分分析的人类转录组模块的数据分析。

Data-driven human transcriptomic modules determined by independent component analysis.

机构信息

Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA.

Department of Genetics, Stanford University, Stanford, CA, 94305, USA.

出版信息

BMC Bioinformatics. 2018 Sep 17;19(1):327. doi: 10.1186/s12859-018-2338-4.

DOI:10.1186/s12859-018-2338-4

PMID:30223787

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6142401/

Abstract

BACKGROUND

Analyzing the human transcriptome is crucial in advancing precision medicine, and the plethora of over half a million human microarray samples in the Gene Expression Omnibus (GEO) has enabled us to better characterize biological processes at the molecular level. However, transcriptomic analysis is challenging because the data is inherently noisy and high-dimensional. Gene set analysis is currently widely used to alleviate the issue of high dimensionality, but the user-defined choice of gene sets can introduce biasness in results. In this paper, we advocate the use of a fixed set of transcriptomic modules for such analysis. We apply independent component analysis to the large collection of microarray data in GEO in order to discover reproducible transcriptomic modules that can be used as features for machine learning. We evaluate the usability of these modules across six studies, and demonstrate (1) their usage as features for sample classification, and also their robustness in dealing with small training sets, (2) their regularization of data when clustering samples and (3) the biological relevancy of differentially expressed features.

RESULTS

We identified 139 reproducible transcriptomic modules, which we term fundamental components (FCs). In studies with less than 50 samples, FC-space classification model outperformed their gene-space counterparts, with higher sensitivity (p < 0.01). The models also had higher accuracy and negative predictive value (p < 0.01) for small data sets (less than 30 samples). Additionally, we observed a reduction in batch effects when data is clustered in the FC-space. Finally, we found that differentially expressed FCs mapped to GO terms that were also identified via traditional gene-based approaches.

CONCLUSIONS

The 139 FCs provide biologically-relevant summarization of transcriptomic data, and their performance in low sample settings suggest that they should be employed in such studies in order to harness the data efficiently.

摘要

背景

分析人类转录组对于推进精准医学至关重要，而 Gene Expression Omnibus（GEO）中超过 50 万个人类微阵列样本使我们能够更好地在分子水平上描述生物学过程。然而，转录组分析具有挑战性，因为数据本质上是嘈杂的和高维的。基因集分析目前被广泛用于缓解高维问题，但用户定义的基因集选择可能会导致结果出现偏差。在本文中，我们提倡在这种分析中使用固定的转录组模块集。我们将独立成分分析应用于 GEO 中大量的微阵列数据，以发现可用于机器学习的可重复转录组模块。我们在六个研究中评估了这些模块的可用性，并证明了（1）它们作为样本分类特征的使用，以及在处理小训练集时的稳健性，（2）它们在聚类样本时对数据的正则化作用，以及（3）差异表达特征的生物学相关性。

结果

我们鉴定了 139 个可重复的转录组模块，我们称之为基本组件（FCs）。在样本少于 50 个的研究中，FC 空间分类模型的表现优于其基因空间对应模型，具有更高的敏感性（p<0.01）。对于小数据集（少于 30 个样本），模型的准确性和阴性预测值（p<0.01）也更高。此外，当数据在 FC 空间中聚类时，我们观察到批次效应减少。最后，我们发现差异表达的 FC 映射到 GO 术语，这些术语也通过传统的基于基因的方法确定。

结论

这 139 个 FC 提供了转录组数据的生物学相关总结，它们在低样本设置下的性能表明，在这些研究中应该采用它们，以有效地利用数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dd67/6142401/a903ca27773e/12859_2018_2338_Fig1_HTML.jpg

相似文献

Data-driven human transcriptomic modules determined by independent component analysis.基于独立成分分析的人类转录组模块的数据分析。

BMC Bioinformatics. 2018 Sep 17;19(1):327. doi: 10.1186/s12859-018-2338-4.

Exploring combinations of dimensionality reduction, transfer learning, and regularization methods for predicting binary phenotypes with transcriptomic data.探索降维、迁移学习和正则化方法的组合，用于利用转录组数据预测二元表型。

BMC Bioinformatics. 2024 Apr 26;25(1):167. doi: 10.1186/s12859-024-05795-6.

A graphical systems model and tissue-specific functional gene sets to aid transcriptomic analysis of chemical impacts on the female teleost reproductive axis.一种图形系统模型和组织特异性功能基因集，用于辅助转录组分析化学物质对雌性鱼类生殖轴的影响。

Mutat Res. 2012 Aug 15;746(2):151-62. doi: 10.1016/j.mrgentox.2011.12.016. Epub 2011 Dec 28.

Optimal dimensionality selection for independent component analysis of transcriptomic data.转录组数据独立成分分析的最优维度选择。

BMC Bioinformatics. 2021 Dec 8;22(1):584. doi: 10.1186/s12859-021-04497-7.

Discovering the transcriptional modules using microarray data by penalized matrix decomposition.基于惩罚矩阵分解的芯片数据转录模块发现

Comput Biol Med. 2011 Nov;41(11):1041-50. doi: 10.1016/j.compbiomed.2011.09.003. Epub 2011 Oct 14.

Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules.基于基因表达谱和功能模块，替换不可靠的cDNA微阵列测量值对疾病分类的影响。

Bioinformatics. 2006 Dec 1;22(23):2883-9. doi: 10.1093/bioinformatics/btl339. Epub 2006 Jun 29.

Identifying key genes in rheumatoid arthritis by weighted gene co-expression network analysis.通过加权基因共表达网络分析鉴定类风湿性关节炎中的关键基因。

Int J Rheum Dis. 2017 Aug;20(8):971-979. doi: 10.1111/1756-185X.13063. Epub 2017 Apr 25.

Hierarchical cortical transcriptome disorganization in autism.自闭症中皮层转录组的分层紊乱

Mol Autism. 2017 Jun 21;8:29. doi: 10.1186/s13229-017-0147-7. eCollection 2017.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

Identification of potential transcriptomic markers in developing pediatric sepsis: a weighted gene co-expression network analysis and a case-control validation study.鉴定小儿脓毒症发育过程中的潜在转录组标志物：加权基因共表达网络分析和病例对照验证研究。

J Transl Med. 2017 Dec 13;15(1):254. doi: 10.1186/s12967-017-1364-8.

引用本文的文献

Soft Modes as a Predictive Framework for Low-Dimensional Biological Systems Across Scales.软模作为跨尺度低维生物系统的预测框架

Annu Rev Biophys. 2025 May;54(1):401-426. doi: 10.1146/annurev-biophys-081624-030543. Epub 2025 Feb 19.

Soft Modes as a Predictive Framework for Low Dimensional Biological Systems across Scales.软模作为跨尺度低维生物系统的预测框架

ArXiv. 2024 Dec 18:arXiv:2412.13637v1.

BMC Bioinformatics. 2024 Apr 26;25(1):167. doi: 10.1186/s12859-024-05795-6.

CoRegNet: unraveling gene co-regulation networks from public RNA-Seq repositories using a beta-binomial statistical model.CoRegNet：利用贝塔二项式统计模型从公共 RNA-Seq 存储库中解析基因共调控网络。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad380.

Predicting drug polypharmacology from cell morphology readouts using variational autoencoder latent space arithmetic.基于变分自动编码器潜在空间算法从细胞形态读取结果预测药物多效性。

PLoS Comput Biol. 2022 Feb 25;18(2):e1009888. doi: 10.1371/journal.pcbi.1009888. eCollection 2022 Feb.

Meta-Analysis of Esophageal Cancer Transcriptomes Using Independent Component Analysis.使用独立成分分析对食管癌转录组进行Meta分析。

Front Genet. 2021 Oct 21;12:683632. doi: 10.3389/fgene.2021.683632. eCollection 2021.

Development of a fixed module repertoire for the analysis and interpretation of blood transcriptome data.建立固定模块库，用于分析和解释血液转录组数据。

Nat Commun. 2021 Jul 19;12(1):4385. doi: 10.1038/s41467-021-24584-w.

BloodGen3Module: blood transcriptional module repertoire analysis and visualization using R.BloodGen3模块：使用R进行血液转录模块库分析与可视化

Bioinformatics. 2021 Aug 25;37(16):2382-2389. doi: 10.1093/bioinformatics/btab121.

Correcting for experiment-specific variability in expression compendia can remove underlying signals.在表达谱综合中纠正实验特异性变异性可以去除潜在信号。

Gigascience. 2020 Nov 3;9(11). doi: 10.1093/gigascience/giaa117.

Application of Transcriptional Gene Modules to Analysis of ' Gene Expression Data.转录基因模块在基因表达数据分析中的应用

G3 (Bethesda). 2020 Oct 5;10(10):3623-3638. doi: 10.1534/g3.120.401270.

本文引用的文献

Unsupervised Extraction of Stable Expression Signatures from Public Compendia with an Ensemble of Neural Networks.无监督提取公共文库中稳定表达特征的神经网络集成方法。

Cell Syst. 2017 Jul 26;5(1):63-71.e6. doi: 10.1016/j.cels.2017.06.003. Epub 2017 Jul 12.

MicroRNA-155 contributes to enhanced resistance to apoptosis in monocytes from patients with rheumatoid arthritis.微小RNA-155有助于增强类风湿关节炎患者单核细胞对凋亡的抵抗能力。

J Autoimmun. 2017 May;79:53-62. doi: 10.1016/j.jaut.2017.01.002. Epub 2017 Jan 22.

ADAGE-Based Integration of Publicly Available Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions.基于ADAGE的公开可用基因表达数据与去噪自动编码器的整合揭示了微生物与宿主的相互作用。

mSystems. 2016 Jan 19;1(1). doi: 10.1128/mSystems.00025-15. eCollection 2016 Jan-Feb.

Imputing gene expression to maximize platform compatibility.估算基因表达以最大化平台兼容性。

Bioinformatics. 2017 Feb 15;33(4):522-528. doi: 10.1093/bioinformatics/btw664.

Principal components analysis and the reported low intrinsic dimensionality of gene expression microarray data.主成分分析与基因表达微阵列数据所报道的低内在维度

Sci Rep. 2016 Jun 2;6:25696. doi: 10.1038/srep25696.

Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing.基因表达数据的低维性使得从浅层测序中准确提取转录程序成为可能。

Cell Syst. 2016 Apr 27;2(4):239-250. doi: 10.1016/j.cels.2016.04.001.

Carbonyl reduction of warfarin: Identification and characterization of human warfarin reductases.华法林的羰基还原：人华法林还原酶的鉴定和特性。

Biochem Pharmacol. 2016 Jun 1;109:83-90. doi: 10.1016/j.bcp.2016.03.025. Epub 2016 Apr 4.

The Molecular Signatures Database (MSigDB) hallmark gene set collection.分子特征数据库（MSigDB）标志性基因集集合。

Cell Syst. 2015 Dec 23;1(6):417-425. doi: 10.1016/j.cels.2015.12.004.

Knockdown of AKR1C3 exposes a potential epigenetic susceptibility in prostate cancer cells.敲低AKR1C3可揭示前列腺癌细胞中潜在的表观遗传易感性。

J Steroid Biochem Mol Biol. 2016 Jan;155(Pt A):47-55. doi: 10.1016/j.jsbmb.2015.09.037. Epub 2015 Sep 30.

Distinct methylation profiles characterize fusion-positive and fusion-negative rhabdomyosarcoma.不同的甲基化谱可区分融合阳性和融合阴性横纹肌肉瘤。

Mod Pathol. 2015 Sep;28(9):1214-24. doi: 10.1038/modpathol.2015.82. Epub 2015 Jul 31.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于独立成分分析的人类转录组模块的数据分析。

Data-driven human transcriptomic modules determined by independent component analysis.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献