• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

数据结构在比较两种用于微阵列基因表达数据分类的降维方法中的重要性。

Importance of data structure in comparing two dimension reduction methods for classification of microarray gene expression data.

作者信息

Truntzer Caroline, Mercier Catherine, Estève Jacques, Gautier Christian, Roy Pascal

机构信息

CNRS, UMR 5558--Equipe Biostatistique Santé, Villeurbanne, France.

出版信息

BMC Bioinformatics. 2007 Mar 13;8:90. doi: 10.1186/1471-2105-8-90.

DOI:10.1186/1471-2105-8-90
PMID:17355634
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1831790/
Abstract

BACKGROUND

With the advance of microarray technology, several methods for gene classification and prognosis have been already designed. However, under various denominations, some of these methods have similar approaches. This study evaluates the influence of gene expression variance structure on the performance of methods that describe the relationship between gene expression levels and a given phenotype through projection of data onto discriminant axes.

RESULTS

We compared Between-Group Analysis and Discriminant Analysis (with prior dimension reduction through Partial Least Squares or Principal Components Analysis). A geometric approach showed that these two methods are strongly related, but differ in the way they handle data structure. Yet, data structure helps understanding the predictive efficiency of these methods. Three main structure situations may be identified. When the clusters of points are clearly split, both methods perform equally well. When the clusters superpose, both methods fail to give interesting predictions. In intermediate situations, the configuration of the clusters of points has to be handled by the projection to improve prediction. For this, we recommend Discriminant Analysis. Besides, an innovative way of simulation generated the three main structures by modelling different partitions of the whole variance into within-group and between-group variances. These simulated datasets were used in complement to some well-known public datasets to investigate the methods behaviour in a large diversity of structure situations. To examine the structure of a dataset before analysis and preselect an a priori appropriate method for its analysis, we proposed a two-graph preliminary visualization tool: plotting patients on the Between-Group Analysis discriminant axis (x-axis) and on the first and the second within-group Principal Components Analysis component (y-axis), respectively.

CONCLUSION

Discriminant Analysis outperformed Between-Group Analysis because it allows for the dataset structure. An a priori knowledge of that structure may guide the choice of the analysis method. Simulated datasets with known properties are valuable to assess and compare the performance of analysis methods, then implementation on real datasets checks and validates the results. Thus, we warn against the use of unchallenging datasets for method comparison, such as the Golub dataset, because their structure is such that any method would be efficient.

摘要

背景

随着微阵列技术的发展,已经设计出了几种基因分类和预后的方法。然而,在各种名称下,其中一些方法有相似的方法。本研究评估了基因表达方差结构对通过将数据投影到判别轴上来描述基因表达水平与给定表型之间关系的方法性能的影响。

结果

我们比较了组间分析和判别分析(通过偏最小二乘法或主成分分析进行先验降维)。一种几何方法表明这两种方法密切相关,但在处理数据结构的方式上有所不同。然而,数据结构有助于理解这些方法的预测效率。可以识别出三种主要的结构情况。当点簇明显分开时,两种方法表现同样良好。当簇重叠时,两种方法都无法给出有趣的预测。在中间情况下,点簇的配置必须通过投影来处理以改善预测。为此,我们推荐判别分析。此外,一种创新的模拟方法通过将整个方差建模为组内方差和组间方差的不同划分来生成三种主要结构。这些模拟数据集被用于补充一些知名的公共数据集,以研究方法在各种结构情况下的行为。为了在分析前检查数据集的结构并预先选择一种适合其分析的先验方法,我们提出了一种双图初步可视化工具:分别将患者绘制在组间分析判别轴(x轴)以及第一和第二组内主成分分析成分(y轴)上。

结论

判别分析优于组间分析,因为它考虑了数据集结构。对该结构的先验知识可以指导分析方法的选择。具有已知属性的模拟数据集对于评估和比较分析方法的性能很有价值,然后在真实数据集上的实施可以检查和验证结果。因此,我们警告不要使用像Golub数据集这样缺乏挑战性的数据集进行方法比较,因为它们的结构使得任何方法都会有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4e/1831790/10ef07b77fef/1471-2105-8-90-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4e/1831790/965f69f1118b/1471-2105-8-90-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4e/1831790/8761676f8d76/1471-2105-8-90-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4e/1831790/8e104b609e8f/1471-2105-8-90-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4e/1831790/8479cf92938c/1471-2105-8-90-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4e/1831790/10ef07b77fef/1471-2105-8-90-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4e/1831790/965f69f1118b/1471-2105-8-90-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4e/1831790/8761676f8d76/1471-2105-8-90-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4e/1831790/8e104b609e8f/1471-2105-8-90-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4e/1831790/8479cf92938c/1471-2105-8-90-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c4e/1831790/10ef07b77fef/1471-2105-8-90-5.jpg

相似文献

1
Importance of data structure in comparing two dimension reduction methods for classification of microarray gene expression data.数据结构在比较两种用于微阵列基因表达数据分类的降维方法中的重要性。
BMC Bioinformatics. 2007 Mar 13;8:90. doi: 10.1186/1471-2105-8-90.
2
Tumor classification by partial least squares using microarray gene expression data.利用微阵列基因表达数据通过偏最小二乘法进行肿瘤分类。
Bioinformatics. 2002 Jan;18(1):39-50. doi: 10.1093/bioinformatics/18.1.39.
3
Dimension reduction for classification with gene expression microarray data.利用基因表达微阵列数据进行分类的降维方法。
Stat Appl Genet Mol Biol. 2006;5:Article6. doi: 10.2202/1544-6115.1147. Epub 2006 Feb 24.
4
Multi-class tumor classification by discriminant partial least squares using microarray gene expression data and assessment of classification models.使用微阵列基因表达数据通过判别偏最小二乘法进行多类别肿瘤分类及分类模型评估
Comput Biol Chem. 2004 Jul;28(3):235-44. doi: 10.1016/j.compbiolchem.2004.05.002.
5
Gene features selection for three-class disease classification via multiple orthogonal partial least square discriminant analysis and S-plot using microarray data.基于微阵列数据的多正交偏最小二乘判别分析和 S-图进行三类疾病分类的基因特征选择。
PLoS One. 2013 Dec 30;8(12):e84253. doi: 10.1371/journal.pone.0084253. eCollection 2013.
6
A combinational feature selection and ensemble neural network method for classification of gene expression data.一种用于基因表达数据分类的组合特征选择与集成神经网络方法。
BMC Bioinformatics. 2004 Sep 27;5:136. doi: 10.1186/1471-2105-5-136.
7
Graph constrained discriminant analysis: a new method for the integration of a graph into a classification process.图约束判别分析:一种将图集成到分类过程中的新方法。
PLoS One. 2011;6(10):e26146. doi: 10.1371/journal.pone.0026146. Epub 2011 Oct 14.
8
Multi-class cancer classification via partial least squares with gene expression profiles.基于基因表达谱的偏最小二乘法进行多类别癌症分类
Bioinformatics. 2002 Sep;18(9):1216-26. doi: 10.1093/bioinformatics/18.9.1216.
9
Improving gene set analysis of microarray data by SAM-GS.通过SAM-GS改进微阵列数据的基因集分析
BMC Bioinformatics. 2007 Jul 5;8:242. doi: 10.1186/1471-2105-8-242.
10
Partial least squares dimension reduction for microarray gene expression data with a censored response.具有删失响应的微阵列基因表达数据的偏最小二乘降维法
Math Biosci. 2005 Jan;193(1):119-37. doi: 10.1016/j.mbs.2004.10.007. Epub 2005 Jan 22.

引用本文的文献

1
Histology image analysis for carcinoma detection and grading.组织学图像分析用于癌症检测和分级。
Comput Methods Programs Biomed. 2012 Sep;107(3):538-56. doi: 10.1016/j.cmpb.2011.12.007. Epub 2012 Mar 20.
2
Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies.研究非线性降维方法在基因和蛋白质表达研究分类中的有效性。
IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):368-84. doi: 10.1109/TCBB.2008.36.
3
Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification.

本文引用的文献

1
Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data.从微阵列数据生成差异表达基因列表的方法的比较与评估
BMC Bioinformatics. 2006 Jul 26;7:359. doi: 10.1186/1471-2105-7-359.
2
Dimension reduction for classification with gene expression microarray data.利用基因表达微阵列数据进行分类的降维方法。
Stat Appl Genet Mol Biol. 2006;5:Article6. doi: 10.2202/1544-6115.1147. Epub 2006 Feb 24.
3
PLS dimension reduction for classification with microarray data.用于微阵列数据分类的偏最小二乘降维法
利用测试样本中嵌入的信息来突破基于微阵列分类中样本量小所造成的限制。
BMC Bioinformatics. 2008 Jun 14;9:280. doi: 10.1186/1471-2105-9-280.
Stat Appl Genet Mol Biol. 2004;3:Article33. doi: 10.2202/1544-6115.1075. Epub 2004 Nov 23.
4
MADE4: an R package for multivariate analysis of gene expression data.MADE4:一个用于基因表达数据多变量分析的R软件包。
Bioinformatics. 2005 Jun 1;21(11):2789-90. doi: 10.1093/bioinformatics/bti394. Epub 2005 Mar 29.
5
Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival.成人T细胞急性淋巴细胞白血病的基因表达谱可识别出对治疗和生存有不同反应的不同患者亚群。
Blood. 2004 Apr 1;103(7):2771-8. doi: 10.1182/blood-2003-09-3243. Epub 2003 Dec 18.
6
Between-group analysis of microarray data.微阵列数据的组间分析。
Bioinformatics. 2002 Dec;18(12):1600-8. doi: 10.1093/bioinformatics/18.12.1600.
7
Gene expression correlates of clinical prostate cancer behavior.临床前列腺癌行为的基因表达相关性
Cancer Cell. 2002 Mar;1(2):203-9. doi: 10.1016/s1535-6108(02)00030-2.
8
Tumor classification by partial least squares using microarray gene expression data.利用微阵列基因表达数据通过偏最小二乘法进行肿瘤分类。
Bioinformatics. 2002 Jan;18(1):39-50. doi: 10.1093/bioinformatics/18.1.39.
9
Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning.通过基因表达谱分析和监督式机器学习预测弥漫性大B细胞淋巴瘤的预后
Nat Med. 2002 Jan;8(1):68-74. doi: 10.1038/nm0102-68.
10
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.癌症的分子分类:通过基因表达监测进行类别发现和类别预测。
Science. 1999 Oct 15;286(5439):531-7. doi: 10.1126/science.286.5439.531.