确定用于可重复转录组数据分析的独立成分的最佳数量。

Determining the optimal number of independent components for reproducible transcriptomic data analysis.

作者信息

Kairov Ulykbek, Cantini Laura, Greco Alessandro, Molkenov Askhat, Czerwinska Urszula, Barillot Emmanuel, Zinovyev Andrei

机构信息

Laboratory of bioinformatics and computational systems biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan.

Institut Curie, PSL Research University, INSERM U900, Mines ParisTech, Paris, France.

出版信息

BMC Genomics. 2017 Sep 11;18(1):712. doi: 10.1186/s12864-017-4112-9.

DOI:10.1186/s12864-017-4112-9

PMID:28893186

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5594474/

Abstract

BACKGROUND

Independent Component Analysis (ICA) is a method that models gene expression data as an action of a set of statistically independent hidden factors. The output of ICA depends on a fundamental parameter: the number of components (factors) to compute. The optimal choice of this parameter, related to determining the effective data dimension, remains an open question in the application of blind source separation techniques to transcriptomic data.

RESULTS

Here we address the question of optimizing the number of statistically independent components in the analysis of transcriptomic data for reproducibility of the components in multiple runs of ICA (within the same or within varying effective dimensions) and in multiple independent datasets. To this end, we introduce ranking of independent components based on their stability in multiple ICA computation runs and define a distinguished number of components (Most Stable Transcriptome Dimension, MSTD) corresponding to the point of the qualitative change of the stability profile. Based on a large body of data, we demonstrate that a sufficient number of dimensions is required for biological interpretability of the ICA decomposition and that the most stable components with ranks below MSTD have more chances to be reproduced in independent studies compared to the less stable ones. At the same time, we show that a transcriptomics dataset can be reduced to a relatively high number of dimensions without losing the interpretability of ICA, even though higher dimensions give rise to components driven by small gene sets.

CONCLUSIONS

We suggest a protocol of ICA application to transcriptomics data with a possibility of prioritizing components with respect to their reproducibility that strengthens the biological interpretation. Computing too few components (much less than MSTD) is not optimal for interpretability of the results. The components ranked within MSTD range have more chances to be reproduced in independent studies.

摘要

背景

独立成分分析（ICA）是一种将基因表达数据建模为一组统计独立的隐藏因素作用的方法。ICA的输出取决于一个基本参数：要计算的成分（因素）数量。与确定有效数据维度相关的该参数的最佳选择，在将盲源分离技术应用于转录组数据时仍是一个悬而未决的问题。

结果

在此，我们解决了在转录组数据分析中优化统计独立成分数量的问题，以实现ICA多次运行（在相同或不同有效维度内）以及多个独立数据集中成分的可重复性。为此，我们基于独立成分在多次ICA计算运行中的稳定性引入了成分排名，并定义了一个与稳定性概况的定性变化点相对应的显著成分数量（最稳定转录组维度，MSTD）。基于大量数据，我们证明ICA分解的生物学可解释性需要足够数量的维度，并且与稳定性较差的成分相比，排名低于MSTD的最稳定成分在独立研究中更有可能被重现。同时，我们表明转录组数据集可以减少到相对较高的维度而不损失ICA的可解释性，尽管更高维度会产生由小基因集驱动的成分。

结论

我们提出了一种将ICA应用于转录组数据的方案，该方案有可能根据成分的可重复性对其进行优先级排序，从而加强生物学解释。计算过少的成分（远少于MSTD）对于结果的可解释性并非最佳。排名在MSTD范围内的成分在独立研究中更有可能被重现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec80/5594474/87e50851b88d/12864_2017_4112_Fig1_HTML.jpg

相似文献

Determining the optimal number of independent components for reproducible transcriptomic data analysis.确定用于可重复转录组数据分析的独立成分的最佳数量。

BMC Genomics. 2017 Sep 11;18(1):712. doi: 10.1186/s12864-017-4112-9.

Optimal dimensionality selection for independent component analysis of transcriptomic data.转录组数据独立成分分析的最优维度选择。

BMC Bioinformatics. 2021 Dec 8;22(1):584. doi: 10.1186/s12859-021-04497-7.

Independent Component Analysis for Unraveling the Complexity of Cancer Omics Datasets.独立成分分析在癌症组学数据集复杂性研究中的应用

Int J Mol Sci. 2019 Sep 7;20(18):4414. doi: 10.3390/ijms20184414.

Retrospective analysis: reproducibility of interblastomere differences of mRNA expression in 2-cell stage mouse embryos is remarkably poor due to combinatorial mechanisms of blastomere diversification.回顾性分析：由于囊胚细胞多样化的组合机制，2 细胞期小鼠胚胎中 mRNA 表达的卵裂球间差异的可重复性极差。

Mol Hum Reprod. 2018 Jul 1;24(7):388-400. doi: 10.1093/molehr/gay021.

Independent component analysis reveals new and biologically significant structures in micro array data.独立成分分析揭示了微阵列数据中的新的且具有生物学意义的结构。

BMC Bioinformatics. 2006 Jun 8;7:290. doi: 10.1186/1471-2105-7-290.

Analysis of fMRI data by blind separation into independent spatial components.通过盲分离为独立空间成分对功能磁共振成像数据进行分析。

Hum Brain Mapp. 1998;6(3):160-88. doi: 10.1002/(SICI)1097-0193(1998)6:3<160::AID-HBM5>3.0.CO;2-1.

Determination of the optimal number of components in independent components analysis.独立成分分析中最优成分数目的确定。

Talanta. 2018 Mar 1;179:538-545. doi: 10.1016/j.talanta.2017.11.051. Epub 2017 Nov 26.

MetICA: independent component analysis for high-resolution mass-spectrometry based non-targeted metabolomics.MetICA：基于高分辨率质谱的非靶向代谢组学的独立成分分析

BMC Bioinformatics. 2016 Mar 2;17:114. doi: 10.1186/s12859-016-0970-4.

Exploring combinations of dimensionality reduction, transfer learning, and regularization methods for predicting binary phenotypes with transcriptomic data.探索降维、迁移学习和正则化方法的组合，用于利用转录组数据预测二元表型。

BMC Bioinformatics. 2024 Apr 26;25(1):167. doi: 10.1186/s12859-024-05795-6.

Assessing reproducibility of matrix factorization methods in independent transcriptomes.评估基质分解方法在独立转录组中的可重复性。

Bioinformatics. 2019 Nov 1;35(21):4307-4313. doi: 10.1093/bioinformatics/btz225.

引用本文的文献

Upfront whole blood transcriptional patterns in patients receiving immune checkpoint inhibitors associate with clinical outcome.接受免疫检查点抑制剂治疗的患者的前期全血转录模式与临床结果相关。

Cancer Immunol Immunother. 2025 Sep 11;74(10):301. doi: 10.1007/s00262-025-04155-4.

iModulonMiner and PyModulon: Software for unsupervised mining of gene expression compendia.iModulonMiner 和 PyModulon：用于非监督挖掘基因表达编目的软件。

PLoS Comput Biol. 2024 Oct 23;20(10):e1012546. doi: 10.1371/journal.pcbi.1012546. eCollection 2024 Oct.

: a unifying computational framework for modular single-cell RNA-seq data integration.用于模块化单细胞RNA测序数据整合的统一计算框架。

NAR Genom Bioinform. 2023 Jul 12;5(3):lqad069. doi: 10.1093/nargab/lqad069. eCollection 2023 Sep.

PPIGCF: A Protein-Protein Interaction-Based Gene Correlation Filter for Optimal Gene Selection.PPIGCF：一种基于蛋白质相互作用的基因关联滤波器，用于最优基因选择。

Genes (Basel). 2023 May 10;14(5):1063. doi: 10.3390/genes14051063.

Automated Classification of Resting-State fMRI ICA Components Using a Deep Siamese Network.使用深度孪生网络对静息态功能磁共振成像独立成分分析组件进行自动分类

Front Neurosci. 2022 Mar 18;16:768634. doi: 10.3389/fnins.2022.768634. eCollection 2022.

Sparse dictionary learning recovers pleiotropy from human cell fitness screens.稀疏字典学习从人类细胞适应性筛选中恢复多效性。

Cell Syst. 2022 Apr 20;13(4):286-303.e10. doi: 10.1016/j.cels.2021.12.005. Epub 2022 Jan 31.

Neural signatures of data-driven psychopathology dimensions at the transition to adolescence.神经影像标志在向青春期过渡时的数据驱动的精神病理学维度。

Eur Psychiatry. 2022 Jan 24;65(1):e12. doi: 10.1192/j.eurpsy.2021.2262.

Optimal dimensionality selection for independent component analysis of transcriptomic data.转录组数据独立成分分析的最优维度选择。

BMC Bioinformatics. 2021 Dec 8;22(1):584. doi: 10.1186/s12859-021-04497-7.

Meta-Analysis of Esophageal Cancer Transcriptomes Using Independent Component Analysis.使用独立成分分析对食管癌转录组进行Meta分析。

Front Genet. 2021 Oct 21;12:683632. doi: 10.3389/fgene.2021.683632. eCollection 2021.

Comparison of metabolic states using genome-scale metabolic models.使用基因组规模代谢模型比较代谢状态。

PLoS Comput Biol. 2021 Nov 8;17(11):e1009522. doi: 10.1371/journal.pcbi.1009522. eCollection 2021 Nov.

本文引用的文献

Meta-analysis reveals conserved cell cycle transcriptional network across multiple human cell types.荟萃分析揭示了多种人类细胞类型中保守的细胞周期转录网络。

BMC Genomics. 2017 Jan 5;18(1):30. doi: 10.1186/s12864-016-3435-2.

Independent component analysis uncovers the landscape of the bladder tumor transcriptome and reveals insights into luminal and basal subtypes.独立成分分析揭示了膀胱肿瘤转录组的全貌，并揭示了管腔型和基底型亚型的相关见解。

Cell Rep. 2014 Nov 20;9(4):1235-45. doi: 10.1016/j.celrep.2014.10.035. Epub 2014 Nov 13.

Robust data driven model order estimation for independent component analysis of FMRI data with low contrast to noise.用于低对比度噪声功能磁共振成像（fMRI）数据独立成分分析的稳健数据驱动模型阶次估计

PLoS One. 2014 Apr 30;9(4):e94943. doi: 10.1371/journal.pone.0094943. eCollection 2014.

Improved estimation of the number of independent components for functional magnetic resonance data by a whitening filter.通过白化滤波器提高功能磁共振数据独立成分数量的估计。

IEEE J Biomed Health Inform. 2013 May;17(3):629-41. doi: 10.1109/jbhi.2013.2253560.

The Cancer Genome Atlas Pan-Cancer analysis project.癌症基因组图谱泛癌分析项目。

Nat Genet. 2013 Oct;45(10):1113-20. doi: 10.1038/ng.2764.

Blind source separation methods for deconvolution of complex signals in cancer biology.癌症生物学中复杂信号解卷积的盲源分离方法。

Biochem Biophys Res Commun. 2013 Jan 18;430(3):1182-7. doi: 10.1016/j.bbrc.2012.12.043. Epub 2012 Dec 19.

Comprehensive molecular portraits of human breast tumours.人类乳腺肿瘤的全面分子特征图谱。

Nature. 2012 Oct 4;490(7418):61-70. doi: 10.1038/nature11412. Epub 2012 Sep 23.

Bayesian independent component analysis recovers pathway signatures from blood metabolomics data.贝叶斯独立成分分析从血液代谢组学数据中恢复途径特征。

J Proteome Res. 2012 Aug 3;11(8):4120-31. doi: 10.1021/pr300231n. Epub 2012 Jul 17.

High-resolution comparative genomic hybridization of inflammatory breast cancer and identification of candidate genes.炎性乳腺癌的高分辨率比较基因组杂交分析及候选基因的鉴定。

PLoS One. 2011 Feb 9;6(2):e16950. doi: 10.1371/journal.pone.0016950.

Independent component and pathway-based analysis of miRNA-regulated gene expression in a model of type 1 diabetes.基于独立成分和途径的 1 型糖尿病模型中 miRNA 调控基因表达分析。

BMC Genomics. 2011 Feb 4;12:97. doi: 10.1186/1471-2164-12-97.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

确定用于可重复转录组数据分析的独立成分的最佳数量。

Determining the optimal number of independent components for reproducible transcriptomic data analysis.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献