• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过纠正统计假设的违反,纠正全基因组芯片分析中意外的 P 值分布。

Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions.

机构信息

MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton, UK.

出版信息

BMC Genomics. 2013 Mar 11;14:161. doi: 10.1186/1471-2164-14-161.

DOI:10.1186/1471-2164-14-161
PMID:23496791
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3610227/
Abstract

BACKGROUND

Statistical analysis of genome-wide microarrays can result in many thousands of identical statistical tests being performed as each probe is tested for an association with a phenotype of interest. If there were no association between any of the probes and the phenotype, the distribution of P values obtained from statistical tests would resemble a Uniform distribution. If a selection of probes were significantly associated with the phenotype we would expect to observe P values for these probes of less than the designated significance level, alpha, resulting in more P values of less than alpha than expected by chance.

RESULTS

In data from a whole genome methylation promoter array we unexpectedly observed P value distributions where there were fewer P values less than alpha than would be expected by chance. Our data suggest that a possible reason for this is a violation of the statistical assumptions required for these tests arising from heteroskedasticity. A simple but statistically sound remedy (a heteroskedasticity-consistent covariance matrix estimator to calculate standard errors of regression coefficients that are robust to heteroskedasticity) rectified this violation and resulted in meaningful P value distributions.

CONCLUSIONS

The statistical analysis of 'omics data requires careful handling, especially in the choice of statistical test. To obtain meaningful results it is essential that the assumptions behind these tests are carefully examined and any violations rectified where possible, or a more appropriate statistical test chosen.

摘要

背景

对全基因组微阵列进行统计分析可能会导致对每个探针进行与感兴趣的表型关联的测试,从而进行数千次相同的统计测试。如果没有任何探针与表型之间存在关联,则从统计测试中获得的 P 值分布将类似于均匀分布。如果选择的一些探针与表型显著相关,我们预计会观察到这些探针的 P 值小于指定的显著性水平α,从而导致小于α的 P 值比预期的机会更多。

结果

在全基因组甲基化启动子阵列的数据中,我们意外地观察到 P 值分布,其中小于α的 P 值比预期的机会要少。我们的数据表明,这种情况的一个可能原因是由于异方差性,这些测试所需的统计假设受到违反。一种简单但统计学上合理的补救方法(一种异方差一致协方差矩阵估计器,用于计算对异方差稳健的回归系数的标准误差)纠正了这种违反情况,并导致了有意义的 P 值分布。

结论

“组学”数据的统计分析需要谨慎处理,尤其是在选择统计测试时。为了获得有意义的结果,必须仔细检查这些测试背后的假设,并尽可能纠正任何违反情况,或者选择更合适的统计测试。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94cf/3610227/dabef5f7fe18/1471-2164-14-161-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94cf/3610227/8a4509f46e6b/1471-2164-14-161-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94cf/3610227/491c7ae24965/1471-2164-14-161-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94cf/3610227/0953d4316c35/1471-2164-14-161-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94cf/3610227/364b1d2fb029/1471-2164-14-161-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94cf/3610227/32b9fab94d9c/1471-2164-14-161-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94cf/3610227/dabef5f7fe18/1471-2164-14-161-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94cf/3610227/8a4509f46e6b/1471-2164-14-161-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94cf/3610227/491c7ae24965/1471-2164-14-161-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94cf/3610227/0953d4316c35/1471-2164-14-161-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94cf/3610227/364b1d2fb029/1471-2164-14-161-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94cf/3610227/32b9fab94d9c/1471-2164-14-161-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94cf/3610227/dabef5f7fe18/1471-2164-14-161-6.jpg

相似文献

1
Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions.通过纠正统计假设的违反,纠正全基因组芯片分析中意外的 P 值分布。
BMC Genomics. 2013 Mar 11;14:161. doi: 10.1186/1471-2164-14-161.
2
Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent.与金标准数据集差异表达检验相对应的假定零分布是强度依赖性的。
BMC Genomics. 2007 Apr 19;8:105. doi: 10.1186/1471-2164-8-105.
3
Guidance for DNA methylation studies: statistical insights from the Illumina EPIC array.Illumina EPIC 阵列的 DNA 甲基化研究统计分析指南
BMC Genomics. 2019 May 14;20(1):366. doi: 10.1186/s12864-019-5761-7.
4
Efficient oligonucleotide probe selection for pan-genomic tiling arrays.高效的全基因组平铺阵列寡核苷酸探针选择。
BMC Bioinformatics. 2009 Sep 16;10:293. doi: 10.1186/1471-2105-10-293.
5
Two-part permutation tests for DNA methylation and microarray data.针对DNA甲基化和微阵列数据的两部分排列检验
BMC Bioinformatics. 2005 Feb 22;6:35. doi: 10.1186/1471-2105-6-35.
6
HAT: hypergeometric analysis of tiling-arrays with application to promoter-GeneChip data.HAT:平铺阵列的超几何分析及其在启动子基因芯片数据中的应用。
BMC Bioinformatics. 2010 May 21;11:275. doi: 10.1186/1471-2105-11-275.
7
Small-sample performance and underlying assumptions of a bootstrap-based inference method for a general analysis of covariance model with possibly heteroskedastic and nonnormal errors.基于自举法的协方差分析模型的小样本性能及潜在假设,适用于可能存在异方差和非正态误差的情况。
Stat Methods Med Res. 2019 Dec;28(12):3808-3821. doi: 10.1177/0962280218817796. Epub 2019 Jan 2.
8
"Gap hunting" to characterize clustered probe signals in Illumina methylation array data.“缺口搜寻”以表征Illumina甲基化阵列数据中的聚类探针信号。
Epigenetics Chromatin. 2016 Dec 7;9:56. doi: 10.1186/s13072-016-0107-z. eCollection 2016.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays.在表达谱分析中交互式优化信噪比:Affymetrix微阵列中特定项目的算法选择和检测p值加权
Bioinformatics. 2004 Nov 1;20(16):2534-44. doi: 10.1093/bioinformatics/bth280. Epub 2004 Apr 29.

引用本文的文献

1
The effect of experimental lead pollution on DNA methylation in a wild bird population.实验性铅污染对野生鸟类种群 DNA 甲基化的影响。
Epigenetics. 2022 Jun;17(6):625-641. doi: 10.1080/15592294.2021.1943863. Epub 2021 Aug 9.
2
Elimination of Reference Mapping Bias Reveals Robust Immune Related Allele-Specific Expression in Crossbred Sheep.消除参考映射偏差揭示了杂交绵羊中强大的免疫相关等位基因特异性表达。
Front Genet. 2019 Sep 19;10:863. doi: 10.3389/fgene.2019.00863. eCollection 2019.
3
Identifying significantly impacted pathways: a comprehensive review and assessment.

本文引用的文献

1
Towards the uniform distribution of null P values on Affymetrix microarrays.实现Affymetrix微阵列上无效P值的均匀分布。
Genome Biol. 2007;8(5):R69. doi: 10.1186/gb-2007-8-5-r69.
2
Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent.与金标准数据集差异表达检验相对应的假定零分布是强度依赖性的。
BMC Genomics. 2007 Apr 19;8:105. doi: 10.1186/1471-2164-8-105.
3
Wnt-1 is dominant over neu in specifying mammary tumor expression profiles.在确定乳腺肿瘤表达谱方面,Wnt-1比neu更具主导性。
识别受显著影响的途径:全面回顾与评估。
Genome Biol. 2019 Oct 9;20(1):203. doi: 10.1186/s13059-019-1790-4.
4
In Epigenomic Studies, Including Cell-Type Adjustments in Regression Models Can Introduce Multicollinearity, Resulting in Apparent Reversal of Direction of Association.在表观基因组学研究中,在回归模型中纳入细胞类型调整可能会引入多重共线性,导致关联方向出现明显反转。
Front Genet. 2019 Sep 10;10:816. doi: 10.3389/fgene.2019.00816. eCollection 2019.
5
Statistical genomics in rare cancer.罕见癌症中的统计基因组学。
Semin Cancer Biol. 2020 Apr;61:1-10. doi: 10.1016/j.semcancer.2019.08.021. Epub 2019 Aug 19.
6
A Comprehensive Survey of Tools and Software for Active Subnetwork Identification.用于活跃子网识别的工具和软件综合调查。
Front Genet. 2019 Mar 5;10:155. doi: 10.3389/fgene.2019.00155. eCollection 2019.
7
QTL Mapping on a Background of Variance Heterogeneity.基于方差异质性背景的数量性状基因座定位
G3 (Bethesda). 2018 Dec 10;8(12):3767-3782. doi: 10.1534/g3.118.200790.
8
DANUBE: Data-driven meta-ANalysis using UnBiased Empirical distributions-applied to biological pathway analysis.多瑙河:使用无偏经验分布的数据驱动元分析——应用于生物途径分析
Proc IEEE Inst Electr Electron Eng. 2017 Mar;105(3):496-515. doi: 10.1109/jproc.2015.2507119. Epub 2016 Mar 31.
9
ANRIL Promoter DNA Methylation: A Perinatal Marker for Later Adiposity.ANRIL 启动子 DNA 甲基化:一种预测后期肥胖的围产期标志物。
EBioMedicine. 2017 May;19:60-72. doi: 10.1016/j.ebiom.2017.03.037. Epub 2017 Apr 26.
10
Gene promoter DNA methylation patterns have a limited role in orchestrating transcriptional changes in the fetal liver in response to maternal folate depletion during pregnancy.基因启动子DNA甲基化模式在协调孕期母体叶酸缺乏时胎儿肝脏的转录变化中作用有限。
Mol Nutr Food Res. 2016 Sep;60(9):2031-42. doi: 10.1002/mnfr.201600079. Epub 2016 Jun 6.
Technol Cancer Res Treat. 2006 Dec;5(6):565-71. doi: 10.1177/153303460600500603.
4
A reanalysis of a published Affymetrix GeneChip control dataset.对已发表的Affymetrix基因芯片对照数据集的重新分析。
Genome Biol. 2006;7(3):401. doi: 10.1186/gb-2006-7-3-401. Epub 2006 Mar 22.
5
Statistical significance for genomewide studies.全基因组研究的统计学显著性
Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9440-5. doi: 10.1073/pnas.1530509100. Epub 2003 Jul 25.