• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关于从大量k个总体中正确选择的概率及其在微阵列数据中的应用。

On the probability of correct selection for large k populations, with application to microarray data.

作者信息

Cui Xinping, Wilson Jason

机构信息

Department of Statistics, University of California, Riverside, CA 92521, USA.

出版信息

Biom J. 2008 Oct;50(5):870-83. doi: 10.1002/bimj.200710457.

DOI:10.1002/bimj.200710457
PMID:18932145
Abstract

One frontier of modern statistical research is the problems arising from data sets with extremely large k (>1000) populations, e.g. microarray and neuroimaging data. For many such problems the focus shifts from testing for significance to selecting, filtering, or screening. Classical Ranking and Selection Methodology (RSM) studied the probability of correct selection (PCS). PCS is the probability that the "best" (t = 1) of k populations is truly selected, according to some specified criteria of best. This paper extends and adapts two selection goals from the RSM literature that are suitable for large k problems (d-best and G-best selection). It is then shown how estimation of PCS for selecting multiple (t > 1) populations with d-best and G-best selection can be implemented to provide a useful measure of the quality of a given selection. A simulation study and the application of the proposed method to a benchmark microarray data set show it is an effective and versatile tool for assessing the probability that a particular gene selection or gene filtering step truly obtains the best genes. Moreover, the proposed method is fully general and may be applied to any such extremely large k problem.

摘要

现代统计研究的一个前沿领域是来自具有极大总体数量(k>1000)的数据集所产生的问题,例如微阵列和神经成像数据。对于许多此类问题,重点从显著性检验转移到选择、过滤或筛选。经典的排序与选择方法(RSM)研究了正确选择概率(PCS)。PCS是根据某些指定的最佳标准,k个总体中“最佳”(t = 1)的那个被真正选中的概率。本文扩展并改编了RSM文献中的两个适合大k问题的选择目标(d-最佳和G-最佳选择)。然后展示了如何通过实施d-最佳和G-最佳选择来估计选择多个(t > 1)总体时的PCS,以提供给定选择质量的有用度量。一项模拟研究以及将所提出的方法应用于一个基准微阵列数据集表明,它是评估特定基因选择或基因过滤步骤真正获得最佳基因概率的有效且通用的工具。此外,所提出的方法具有完全的通用性,可应用于任何此类极大k问题。

相似文献

1
On the probability of correct selection for large k populations, with application to microarray data.关于从大量k个总体中正确选择的概率及其在微阵列数据中的应用。
Biom J. 2008 Oct;50(5):870-83. doi: 10.1002/bimj.200710457.
2
Probability fold change: a robust computational approach for identifying differentially expressed gene lists.概率倍数变化:一种用于识别差异表达基因列表的稳健计算方法。
Comput Methods Programs Biomed. 2009 Feb;93(2):124-39. doi: 10.1016/j.cmpb.2008.07.013. Epub 2008 Oct 7.
3
Sharp simultaneous confidence intervals for the means of selected populations with application to microarray data analysis.用于选定总体均值的精确同步置信区间及其在微阵列数据分析中的应用。
Biometrics. 2007 Sep;63(3):767-76. doi: 10.1111/j.1541-0420.2007.00770.x. Epub 2007 Apr 2.
4
An investigation on performance of Significance Analysis of Microarray (SAM) for the comparisons of several treatments with one control in the presence of small-variance genes.在存在小方差基因的情况下,对微阵列显著性分析(SAM)用于几种处理与一个对照进行比较的性能研究。
Biom J. 2008 Oct;50(5):801-23. doi: 10.1002/bimj.200710467.
5
Modeling microarray data using a threshold mixture model.使用阈值混合模型对微阵列数据进行建模。
Biometrics. 2004 Jun;60(2):376-87. doi: 10.1111/j.0006-341X.2004.00182.x.
6
Assessing quality of hybridized RNA in Affymetrix GeneChip experiments using mixed-effects models.使用混合效应模型评估Affymetrix基因芯片实验中杂交RNA的质量。
Biostatistics. 2006 Apr;7(2):198-212. doi: 10.1093/biostatistics/kxj001. Epub 2005 Aug 31.
7
Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification.基于改进粒子群优化算法的基因与训练样本同步选择用于基因表达数据分类
Comput Biol Med. 2009 Jul;39(7):646-9. doi: 10.1016/j.compbiomed.2009.04.008. Epub 2009 May 28.
8
A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data.一种基于时刻估计真零假设比例的方法及其在微阵列基因表达数据中的应用。
Biostatistics. 2007 Oct;8(4):744-55. doi: 10.1093/biostatistics/kxm002. Epub 2007 Jan 22.
9
Identification of differential gene expression for microarray data using recursive random forest.使用递归随机森林识别微阵列数据中的差异基因表达
Chin Med J (Engl). 2008 Dec 20;121(24):2492-6.
10
A tail strength measure for assessing the overall univariate significance in a dataset.一种用于评估数据集中总体单变量显著性的尾部强度度量。
Biostatistics. 2006 Apr;7(2):167-81. doi: 10.1093/biostatistics/kxj009. Epub 2005 Dec 6.

引用本文的文献

1
Optimized ranking and selection methods for feature selection with application in microarray experiments.用于微阵列实验中特征选择的优化排序与选择方法
J Biopharm Stat. 2010 Mar;20(2):223-39. doi: 10.1080/10543400903572720.