对SAM、SAM R包的全面评估以及一项旨在提高其性能的简单修改。

A comprehensive evaluation of SAM, the SAM R-package and a simple modification to improve its performance.

作者信息

Zhang Shunpu

机构信息

Department of Statistics, University of Nebraska Lincoln, Lincoln, NE 68583-0963, USA.

出版信息

BMC Bioinformatics. 2007 Jun 29;8:230. doi: 10.1186/1471-2105-8-230.

DOI:10.1186/1471-2105-8-230

PMID:17603887

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1955751/

Abstract

BACKGROUND

The Significance Analysis of Microarrays (SAM) is a popular method for detecting significantly expressed genes and controlling the false discovery rate (FDR). Recently, it has been reported in the literature that the FDR is not well controlled by SAM. Due to the vast application of SAM in microarray data analysis, it is of great importance to have an extensive evaluation of SAM and its associated R-package (sam2.20).

RESULTS

Our study has identified several discrepancies between SAM and sam2.20. One major difference is that SAM and sam2.20 use different methods for estimating FDR. Such discrepancies may cause confusion among the researchers who are using SAM or are developing the SAM-like methods. We have also shown that SAM provides no meaningful estimates of FDR and this problem has been corrected in sam2.20 by using a different formula for estimating FDR. However, we have found that, even with the improvement sam2.20 has made over SAM, sam2.20 may still produce erroneous and even conflicting results under certain situations. Using an example, we show that the problem of sam2.20 is caused by its use of asymmetric cutoffs which are due to the large variability of null scores at both ends of the order statistics. An obvious approach without the complication of the order statistics is the conventional symmetric cutoff method. For this reason, we have carried out extensive simulations to compare the performance of sam2.20 and the symmetric cutoff method. Finally, a simple modification is proposed to improve the FDR estimation of sam2.20 and the symmetric cutoff method.

CONCLUSION

Our study shows that the most serious drawback of SAM is its poor estimation of FDR. Although this drawback has been corrected in sam2.20, the control of FDR by sam2.20 is still not satisfactory. The comparison between sam2.20 and the symmetric cutoff method reveals that the relative performance of sam2.20 to the symmetric cutff method depends on the ratio of induced to repressed genes in a microarray data, and is also affected by the ratio of DE to EE genes and the distributions of induced and repressed genes. Numerical simulations show that the symmetric cutoff method has the biggest advantage over sam2.20 when there are equal number of induced and repressed genes (i.e., the ratio of induced to repressed genes is 1). As the ratio of induced to repressed genes moves away from 1, the advantage of the symmetric cutoff method to sam2.20 is gradually diminishing until eventually sam2.20 becomes significantly better than the symmetric cutoff method when the differentially expressed (DE) genes are either all induced or all repressed genes. Simulation results also show that our proposed simple modification provides improved control of FDR for both sam2.20 and the symmetric cutoff method.

摘要

背景

微阵列显著性分析（SAM）是一种用于检测显著表达基因并控制错误发现率（FDR）的常用方法。最近，文献报道SAM对FDR的控制效果不佳。由于SAM在微阵列数据分析中的广泛应用，对SAM及其相关的R包（sam2.20）进行全面评估具有重要意义。

结果

我们的研究发现了SAM和sam2.20之间的几个差异。一个主要区别是SAM和sam2.20使用不同的方法来估计FDR。这种差异可能会给使用SAM或开发类似SAM方法的研究人员带来困惑。我们还表明，SAM无法提供有意义的FDR估计值，而sam2.20通过使用不同的公式来估计FDR纠正了这个问题。然而，我们发现，即使sam2.20在SAM的基础上有所改进，但在某些情况下，sam2.20仍可能产生错误甚至相互矛盾的结果。通过一个例子，我们表明sam2.20的问题是由于其使用不对称截断值导致的，这是由于顺序统计量两端的零分变化很大。一种没有顺序统计量复杂性的明显方法是传统的对称截断方法。因此，我们进行了广泛的模拟，以比较sam2.20和对称截断方法的性能。最后，提出了一个简单的修改方法来改进sam2.20和对称截断方法的FDR估计。

结论

我们的研究表明，SAM最严重的缺点是其对FDR的估计不佳。尽管这个缺点在sam2.20中得到了纠正，但sam2.20对FDR的控制仍然不令人满意。sam2.20与对称截断方法的比较表明，sam2.20相对于对称截断方法的相对性能取决于微阵列数据中诱导基因与抑制基因的比例，并且还受到差异表达（DE）基因与等效表达（EE）基因的比例以及诱导基因和抑制基因分布的影响。数值模拟表明，当诱导基因和抑制基因数量相等（即诱导基因与抑制基因的比例为1）时，对称截断方法相对于sam2.20具有最大优势。随着诱导基因与抑制基因的比例偏离1，对称截断方法相对于sam2.20的优势逐渐减小，直到最终当差异表达（DE）基因全部为诱导基因或全部为抑制基因时，sam2.20明显优于对称截断方法。模拟结果还表明，我们提出的简单修改方法为sam2.20和对称截断方法都提供了更好的FDR控制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc6d/1955751/242121982efc/1471-2105-8-230-1.jpg

相似文献

A comprehensive evaluation of SAM, the SAM R-package and a simple modification to improve its performance.对SAM、SAM R包的全面评估以及一项旨在提高其性能的简单修改。

BMC Bioinformatics. 2007 Jun 29;8:230. doi: 10.1186/1471-2105-8-230.

A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data.关于使用基于排列的错误发现率估计来比较微阵列数据不同分析方法的说明。

Bioinformatics. 2005 Dec 1;21(23):4280-8. doi: 10.1093/bioinformatics/bti685. Epub 2005 Sep 27.

False discovery rate, sensitivity and sample size for microarray studies.微阵列研究的错误发现率、敏感性和样本量

Bioinformatics. 2005 Jul 1;21(13):3017-24. doi: 10.1093/bioinformatics/bti448. Epub 2005 Apr 19.

On correcting the overestimation of the permutation-based false discovery rate estimator.关于校正基于排列的错误发现率估计器的高估问题。

Bioinformatics. 2008 Aug 1;24(15):1655-61. doi: 10.1093/bioinformatics/btn310. Epub 2008 Jun 23.

Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures.在强相关结构下改进错误发现率（FDR）控制中零假设数量估计的重采样策略。

BMC Bioinformatics. 2007 May 18;8:157. doi: 10.1186/1471-2105-8-157.

A unified framework for finding differentially expressed genes from microarray experiments.一种从微阵列实验中寻找差异表达基因的统一框架。

BMC Bioinformatics. 2007 Sep 18;8:347. doi: 10.1186/1471-2105-8-347.

Multidimensional local false discovery rate for microarray studies.微阵列研究的多维局部错误发现率

Bioinformatics. 2006 Mar 1;22(5):556-65. doi: 10.1093/bioinformatics/btk013. Epub 2005 Dec 20.

A mixture model for estimating the local false discovery rate in DNA microarray analysis.一种用于估计DNA微阵列分析中局部错误发现率的混合模型。

Bioinformatics. 2004 Nov 1;20(16):2694-701. doi: 10.1093/bioinformatics/bth310. Epub 2004 May 14.

Effects of dependence in high-dimensional multiple testing problems.高维多重检验问题中相依性的影响。

BMC Bioinformatics. 2008 Feb 25;9:114. doi: 10.1186/1471-2105-9-114.

Significance analysis of microarray for relative quantitation of LC/MS data in proteomics.蛋白质组学中用于液相色谱/质谱数据相对定量的微阵列显著性分析。

BMC Bioinformatics. 2008 Apr 10;9:187. doi: 10.1186/1471-2105-9-187.

引用本文的文献

Comparative Untargeted Metabolomic Profiling of Induced Mitochondrial Fusion in Pancreatic Cancer.胰腺癌中诱导线粒体融合的比较非靶向代谢组学分析

Metabolites. 2021 Sep 15;11(9):627. doi: 10.3390/metabo11090627.

Lack of Atorvastatin Effect on Monocyte Gene Expression and Inflammatory Markers in HIV-1-infected ART-suppressed Individuals at Risk of non-AIDS Comorbidities.阿托伐他汀对有非艾滋病合并症风险的HIV-1感染且接受抗逆转录病毒治疗抑制的个体的单核细胞基因表达和炎症标志物无影响。

Pathog Immun. 2021 Aug 13;6(2):1-26. doi: 10.20411/pai.v6i2.461. eCollection 2021.

Evidence for Persistent Monocyte and Immune Dysregulation After Prolonged Viral Suppression Despite Normalization of Monocyte Subsets, sCD14 and sCD163 in HIV-Infected Individuals.尽管HIV感染者的单核细胞亚群、可溶性CD14和可溶性CD163已恢复正常，但长期病毒抑制后仍存在持续性单核细胞和免疫失调的证据。

Pathog Immun. 2019 Dec 17;4(2):324-362. doi: 10.20411/pai.v4i2.336. eCollection 2019.

Natural human genetic variation determines basal and inducible expression of , an obesity-associated gene.天然人类遗传变异决定了肥胖相关基因的基础表达和诱导表达。

Proc Natl Acad Sci U S A. 2019 Nov 12;116(46):23232-23242. doi: 10.1073/pnas.1913199116. Epub 2019 Oct 28.

Identification of monocyte-like precursors of granulocytes in cancer as a mechanism for accumulation of PMN-MDSCs.鉴定癌症中粒细胞样单核细胞前体细胞作为 PMN-MDSC 积累的机制。

J Exp Med. 2019 Sep 2;216(9):2150-2169. doi: 10.1084/jem.20181952. Epub 2019 Jun 25.

Robust gene selection methods using weighting schemes for microarray data analysis.用于微阵列数据分析的采用加权方案的稳健基因选择方法。

BMC Bioinformatics. 2017 Sep 2;18(1):389. doi: 10.1186/s12859-017-1810-x.

Lectin-type oxidized LDL receptor-1 distinguishes population of human polymorphonuclear myeloid-derived suppressor cells in cancer patients.凝集素型氧化型低密度脂蛋白受体-1可区分癌症患者中人类多形核髓源性抑制细胞群体。

Sci Immunol. 2016 Aug;1(2). doi: 10.1126/sciimmunol.aaf8943. Epub 2016 Aug 5.

Identify potential drugs for cardiovascular diseases caused by stress-induced genes in vascular smooth muscle cells.确定针对血管平滑肌细胞中应激诱导基因所导致的心血管疾病的潜在药物。

PeerJ. 2016 Sep 28;4:e2478. doi: 10.7717/peerj.2478. eCollection 2016.

Thermodynamically optimal whole-genome tiling microarray design and validation.热力学最优全基因组平铺微阵列设计与验证

BMC Res Notes. 2016 Jun 13;9:305. doi: 10.1186/s13104-016-2113-4.

Feature genes predicting the FLT3/ITD mutation in acute myeloid leukemia.预测急性髓系白血病中FLT3/ITD突变的特征基因。

Mol Med Rep. 2016 Jul;14(1):89-94. doi: 10.3892/mmr.2016.5260. Epub 2016 May 12.

本文引用的文献

An improved nonparametric approach for detecting differentially expressed genes with replicated microarray data.一种用于利用重复微阵列数据检测差异表达基因的改进非参数方法。

Stat Appl Genet Mol Biol. 2006;5:Article30. doi: 10.2202/1544-6115.1246. Epub 2007 Jan 2.

Linear models and empirical bayes methods for assessing differential expression in microarray experiments.用于评估微阵列实验中差异表达的线性模型和经验贝叶斯方法。

Stat Appl Genet Mol Biol. 2004;3:Article3. doi: 10.2202/1544-6115.1027. Epub 2004 Feb 12.

Bioinformatics. 2005 Dec 1;21(23):4280-8. doi: 10.1093/bioinformatics/bti685. Epub 2005 Sep 27.

Using weighted permutation scores to detect differential gene expression with microarray data.使用加权排列分数通过微阵列数据检测差异基因表达。

J Bioinform Comput Biol. 2005 Aug;3(4):989-1006. doi: 10.1142/s021972000500134x.

Considerations when using the significance analysis of microarrays (SAM) algorithm.使用微阵列显著性分析（SAM）算法时的注意事项。

BMC Bioinformatics. 2005 May 29;6:129. doi: 10.1186/1471-2105-6-129.

VarMixt: efficient variance modelling for the differential analysis of replicated gene expression data.VarMixt：用于重复基因表达数据差异分析的高效方差建模

Bioinformatics. 2005 Feb 15;21(4):502-8. doi: 10.1093/bioinformatics/bti023. Epub 2004 Sep 16.

Detecting differential gene expression with a semiparametric hierarchical mixture method.使用半参数分层混合方法检测差异基因表达。

Biostatistics. 2004 Apr;5(2):155-76. doi: 10.1093/biostatistics/5.2.155.

On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles.关于使用重复基因表达谱比较多个组的参数经验贝叶斯方法。

Stat Med. 2003 Dec 30;22(24):3899-914. doi: 10.1002/sim.1548.

Statistical significance for genomewide studies.全基因组研究的统计学显著性

Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9440-5. doi: 10.1073/pnas.1530509100. Epub 2003 Jul 25.

On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression.关于排列在一类用于检测差异基因表达的非参数方法中的应用及其性能。

Bioinformatics. 2003 Jul 22;19(11):1333-40. doi: 10.1093/bioinformatics/btg167.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对SAM、SAM R包的全面评估以及一项旨在提高其性能的简单修改。

A comprehensive evaluation of SAM, the SAM R-package and a simple modification to improve its performance.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献