• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过解释主成分挖掘基因表达数据。

Mining gene expression data by interpreting principal components.

作者信息

Roden Joseph C, King Brandon W, Trout Diane, Mortazavi Ali, Wold Barbara J, Hart Christopher E

机构信息

Jet Propulsion Laboratory, California Institute of Technology, Pasadena, USA.

出版信息

BMC Bioinformatics. 2006 Apr 7;7:194. doi: 10.1186/1471-2105-7-194.

DOI:10.1186/1471-2105-7-194
PMID:16600052
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1501050/
Abstract

BACKGROUND

There are many methods for analyzing microarray data that group together genes having similar patterns of expression over all conditions tested. However, in many instances the biologically important goal is to identify relatively small sets of genes that share coherent expression across only some conditions, rather than all or most conditions as required in traditional clustering; e.g. genes that are highly up-regulated and/or down-regulated similarly across only a subset of conditions. Equally important is the need to learn which conditions are the decisive ones in forming such gene sets of interest, and how they relate to diverse conditional covariates, such as disease diagnosis or prognosis.

RESULTS

We present a method for automatically identifying such candidate sets of biologically relevant genes using a combination of principal components analysis and information theoretic metrics. To enable easy use of our methods, we have developed a data analysis package that facilitates visualization and subsequent data mining of the independent sources of significant variation present in gene microarray expression datasets (or in any other similarly structured high-dimensional dataset). We applied these tools to two public datasets, and highlight sets of genes most affected by specific subsets of conditions (e.g. tissues, treatments, samples, etc.). Statistically significant associations for highlighted gene sets were shown via global analysis for Gene Ontology term enrichment. Together with covariate associations, the tool provides a basis for building testable hypotheses about the biological or experimental causes of observed variation.

CONCLUSION

We provide an unsupervised data mining technique for diverse microarray expression datasets that is distinct from major methods now in routine use. In test uses, this method, based on publicly available gene annotations, appears to identify numerous sets of biologically relevant genes. It has proven especially valuable in instances where there are many diverse conditions (10's to hundreds of different tissues or cell types), a situation in which many clustering and ordering algorithms become problematic. This approach also shows promise in other topic domains such as multi-spectral imaging datasets.

摘要

背景

有许多方法可用于分析微阵列数据,这些方法会将在所有测试条件下具有相似表达模式的基因归为一组。然而,在许多情况下,生物学上的重要目标是识别仅在某些条件下而非传统聚类所要求的所有或大多数条件下具有一致表达的相对较小的基因集;例如,仅在一部分条件下相似地上调或下调的基因。同样重要的是,需要了解哪些条件是形成此类感兴趣基因集的决定性条件,以及它们与各种条件协变量(如疾病诊断或预后)的关系。

结果

我们提出了一种结合主成分分析和信息论指标自动识别此类生物学相关基因候选集的方法。为了便于使用我们的方法,我们开发了一个数据分析包,该包有助于对基因微阵列表达数据集(或任何其他类似结构的高维数据集)中存在的显著变异的独立来源进行可视化和后续数据挖掘。我们将这些工具应用于两个公共数据集,并突出显示受特定条件子集(如组织、处理、样本等)影响最大的基因集。通过对基因本体术语富集的全局分析显示了突出显示的基因集的统计学显著关联。连同协变量关联,该工具为建立关于观察到的变异的生物学或实验原因的可测试假设提供了基础。

结论

我们为各种微阵列表达数据集提供了一种无监督数据挖掘技术,该技术与目前常规使用的主要方法不同。在测试使用中,基于公开可用基因注释的这种方法似乎可以识别许多生物学相关基因集。在存在许多不同条件(数十种到数百种不同组织或细胞类型)的情况下,这种方法已被证明特别有价值,在这种情况下,许多聚类和排序算法会出现问题。这种方法在其他主题领域(如多光谱成像数据集)也显示出前景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fc0/1501050/2e961363d1ce/1471-2105-7-194-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fc0/1501050/bc3a0f6fb9bc/1471-2105-7-194-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fc0/1501050/04592d6a4eda/1471-2105-7-194-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fc0/1501050/062cdd7196f2/1471-2105-7-194-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fc0/1501050/4be24a0fd6d8/1471-2105-7-194-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fc0/1501050/2e961363d1ce/1471-2105-7-194-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fc0/1501050/bc3a0f6fb9bc/1471-2105-7-194-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fc0/1501050/04592d6a4eda/1471-2105-7-194-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fc0/1501050/062cdd7196f2/1471-2105-7-194-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fc0/1501050/4be24a0fd6d8/1471-2105-7-194-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fc0/1501050/2e961363d1ce/1471-2105-7-194-5.jpg

相似文献

1
Mining gene expression data by interpreting principal components.通过解释主成分挖掘基因表达数据。
BMC Bioinformatics. 2006 Apr 7;7:194. doi: 10.1186/1471-2105-7-194.
2
Biclustering of microarray data with MOSPO based on crowding distance.基于拥挤距离使用MOSPO对微阵列数据进行双聚类分析。
BMC Bioinformatics. 2009 Apr 29;10 Suppl 4(Suppl 4):S9. doi: 10.1186/1471-2105-10-S4-S9.
3
Biclustering of gene expression data by Non-smooth Non-negative Matrix Factorization.基于非光滑非负矩阵分解的基因表达数据双聚类分析
BMC Bioinformatics. 2006 Feb 17;7:78. doi: 10.1186/1471-2105-7-78.
4
Biologically supervised hierarchical clustering algorithms for gene expression data.用于基因表达数据的生物监督层次聚类算法。
Conf Proc IEEE Eng Med Biol Soc. 2006;2006:5515-8. doi: 10.1109/IEMBS.2006.260308.
5
Exploring matrix factorization techniques for significant genes identification of Alzheimer's disease microarray gene expression data.探索矩阵分解技术在阿尔茨海默病基因表达数据中显著基因识别中的应用。
BMC Bioinformatics. 2011;12 Suppl 5(Suppl 5):S7. doi: 10.1186/1471-2105-12-S5-S7. Epub 2011 Jul 27.
6
Discovering biclusters in gene expression data based on high-dimensional linear geometries.基于高维线性几何在基因表达数据中发现双簇。
BMC Bioinformatics. 2008 Apr 23;9:209. doi: 10.1186/1471-2105-9-209.
7
Spectral embedding finds meaningful (relevant) structure in image and microarray data.谱嵌入可在图像和微阵列数据中找到有意义(相关)的结构。
BMC Bioinformatics. 2006 Feb 16;7:74. doi: 10.1186/1471-2105-7-74.
8
Effect of data normalization on fuzzy clustering of DNA microarray data.数据归一化对DNA微阵列数据模糊聚类的影响。
BMC Bioinformatics. 2006 Mar 14;7:134. doi: 10.1186/1471-2105-7-134.
9
CoXpress: differential co-expression in gene expression data.CoXpress:基因表达数据中的差异共表达
BMC Bioinformatics. 2006 Nov 20;7:509. doi: 10.1186/1471-2105-7-509.
10
Microarray data mining using landmark gene-guided clustering.使用标志性基因引导聚类的微阵列数据挖掘
BMC Bioinformatics. 2008 Feb 11;9:92. doi: 10.1186/1471-2105-9-92.

引用本文的文献

1
Erythropoiesis and Gene Expression Analysis in Erythroid Progenitor Cells Derived from Patients with Hemoglobin H/Constant Spring Disease.血红蛋白 H/Constant Spring 病患者来源的红系祖细胞的红细胞生成和基因表达分析。
Int J Mol Sci. 2024 Oct 19;25(20):11246. doi: 10.3390/ijms252011246.
2
Model-based dimensionality reduction for single-cell RNA-seq using generalized bilinear models.使用广义双线性模型对单细胞RNA测序进行基于模型的降维。
bioRxiv. 2024 Feb 16:2023.04.21.537881. doi: 10.1101/2023.04.21.537881.
3
Analysis of Dormancy-Associated Transcriptional Networks Reveals a Shared Quiescence Signature in Lung and Colorectal Cancer.

本文引用的文献

1
An expression-based site of origin diagnostic method designed for clinical application to cancer of unknown origin.一种基于表达的起源部位诊断方法,专为临床应用于不明起源癌症而设计。
Cancer Res. 2005 May 15;65(10):4031-40. doi: 10.1158/0008-5472.CAN-04-3617.
2
A mathematical and computational framework for quantitative comparison and integration of large-scale gene expression data.用于大规模基因表达数据定量比较与整合的数学和计算框架。
Nucleic Acids Res. 2005 May 10;33(8):2580-94. doi: 10.1093/nar/gki536. Print 2005.
3
A gene atlas of the mouse and human protein-encoding transcriptomes.
休眠相关转录网络分析揭示了肺癌和结直肠癌中共同的静止特征。
Int J Mol Sci. 2022 Aug 30;23(17):9869. doi: 10.3390/ijms23179869.
4
A multivariate statistical test for differential expression analysis.用于差异表达分析的多变量统计检验。
Sci Rep. 2022 May 18;12(1):8265. doi: 10.1038/s41598-022-12246-w.
5
Islet sympathetic innervation and islet neuropathology in patients with type 1 diabetes.1 型糖尿病患者胰岛交感神经支配和胰岛神经病理学。
Sci Rep. 2021 Mar 22;11(1):6562. doi: 10.1038/s41598-021-85659-8.
6
Phylostratic Shift of Whole-Genome Duplications in Normal Mammalian Tissues towards Unicellularity Is Driven by Developmental Bivalent Genes and Reveals a Link to Cancer.全基因组倍增在正常哺乳动物组织中的系统发生移位向单细胞性发展是由二价发育基因驱动的,并揭示了与癌症的联系。
Int J Mol Sci. 2020 Nov 19;21(22):8759. doi: 10.3390/ijms21228759.
7
A new metric for understanding hidden political influences from voting records.一种理解投票记录中隐藏的政治影响的新指标。
PLoS One. 2020 Sep 1;15(9):e0238481. doi: 10.1371/journal.pone.0238481. eCollection 2020.
8
Transcriptional Basis for Differential Thermosensitivity of Seedlings of Various Tomato Genotypes.不同番茄基因型幼苗对温度敏感性差异的转录基础。
Genes (Basel). 2020 Jun 16;11(6):655. doi: 10.3390/genes11060655.
9
A pre-existing population of ZEB2 quiescent cells with stemness and mesenchymal features dictate chemoresistance in colorectal cancer.在结直肠癌中,具有干性和间充质特征的 ZEB2 静止细胞的预先存在的群体决定了化疗耐药性。
J Exp Clin Cancer Res. 2020 Jan 8;39(1):2. doi: 10.1186/s13046-019-1505-4.
10
In vitro gill cell monolayer successfully reproduces in vivo Atlantic salmon host responses to Neoparamoeba perurans infection.体外鳃细胞单层成功再现了大西洋鲑鱼宿主对新派琴虫感染的体内反应。
Fish Shellfish Immunol. 2019 Mar;86:287-300. doi: 10.1016/j.fsi.2018.11.029. Epub 2018 Nov 17.
小鼠和人类蛋白质编码转录组的基因图谱。
Proc Natl Acad Sci U S A. 2004 Apr 20;101(16):6062-7. doi: 10.1073/pnas.0400782101. Epub 2004 Apr 9.
4
An unsupervised approach to identify molecular phenotypic components influencing breast cancer features.一种用于识别影响乳腺癌特征的分子表型成分的无监督方法。
Cancer Res. 2004 Mar 1;64(5):1584-8. doi: 10.1158/0008-5472.can-03-3208.
5
PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes.参与氧化磷酸化的PGC-1α反应性基因在人类糖尿病中协同下调。
Nat Genet. 2003 Jul;34(3):267-73. doi: 10.1038/ng1180.
6
Iterative signature algorithm for the analysis of large-scale gene expression data.用于大规模基因表达数据分析的迭代特征算法
Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Mar;67(3 Pt 1):031902. doi: 10.1103/PhysRevE.67.031902. Epub 2003 Mar 11.
7
From patterns to pathways: gene expression data analysis comes of age.从模式到通路:基因表达数据分析渐趋成熟。
Nat Genet. 2002 Dec;32 Suppl:502-8. doi: 10.1038/ng1033.
8
Nonparametric methods for identifying differentially expressed genes in microarray data.用于识别微阵列数据中差异表达基因的非参数方法。
Bioinformatics. 2002 Nov;18(11):1454-61. doi: 10.1093/bioinformatics/18.11.1454.
9
Revealing modular organization in the yeast transcriptional network.揭示酵母转录网络中的模块化组织。
Nat Genet. 2002 Aug;31(4):370-7. doi: 10.1038/ng941. Epub 2002 Jul 22.
10
Genesis: cluster analysis of microarray data.《起源:微阵列数据的聚类分析》
Bioinformatics. 2002 Jan;18(1):207-8. doi: 10.1093/bioinformatics/18.1.207.