• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用高维DNA微阵列数据开发分类器的样本量规划

Sample size planning for developing classifiers using high-dimensional DNA microarray data.

作者信息

Dobbin Kevin K, Simon Richard M

机构信息

Biometric Research Branch, National Cancer Institute, 6130 Executive Boulevard, Rockville, MD 20852, USA.

出版信息

Biostatistics. 2007 Jan;8(1):101-17. doi: 10.1093/biostatistics/kxj036. Epub 2006 Apr 13.

DOI:10.1093/biostatistics/kxj036
PMID:16613833
Abstract

Many gene expression studies attempt to develop a predictor of pre-defined diagnostic or prognostic classes. If the classes are similar biologically, then the number of genes that are differentially expressed between the classes is likely to be small compared to the total number of genes measured. This motivates a two-step process for predictor development, a subset of differentially expressed genes is selected for use in the predictor and then the predictor constructed from these. Both these steps will introduce variability into the resulting classifier, so both must be incorporated in sample size estimation. We introduce a methodology for sample size determination for prediction in the context of high-dimensional data that captures variability in both steps of predictor development. The methodology is based on a parametric probability model, but permits sample size computations to be carried out in a practical manner without extensive requirements for preliminary data. We find that many prediction problems do not require a large training set of arrays for classifier development.

摘要

许多基因表达研究试图开发一种针对预定义诊断或预后类别的预测指标。如果这些类别在生物学上相似,那么与所测量的基因总数相比,类别之间差异表达的基因数量可能较少。这促使了一种用于预测指标开发的两步法,即选择差异表达基因的一个子集用于预测指标,然后由这些基因构建预测指标。这两个步骤都会给最终的分类器引入变异性,因此在样本量估计中都必须予以考虑。我们引入了一种在高维数据背景下进行预测样本量确定的方法,该方法能够捕捉预测指标开发两个步骤中的变异性。该方法基于一个参数概率模型,但允许以一种实际可行的方式进行样本量计算,而无需对初步数据有大量要求。我们发现,许多预测问题在开发分类器时并不需要大量的阵列训练集。

相似文献

1
Sample size planning for developing classifiers using high-dimensional DNA microarray data.使用高维DNA微阵列数据开发分类器的样本量规划
Biostatistics. 2007 Jan;8(1):101-17. doi: 10.1093/biostatistics/kxj036. Epub 2006 Apr 13.
2
How large a training set is needed to develop a classifier for microarray data?开发一个用于微阵列数据的分类器需要多大的训练集?
Clin Cancer Res. 2008 Jan 1;14(1):108-14. doi: 10.1158/1078-0432.CCR-07-0443.
3
Practical FDR-based sample size calculations in microarray experiments.微阵列实验中基于实际错误发现率的样本量计算
Bioinformatics. 2005 Aug 1;21(15):3264-72. doi: 10.1093/bioinformatics/bti519. Epub 2005 Jun 2.
4
Optimal number of features as a function of sample size for various classification rules.针对各种分类规则,作为样本大小函数的最优特征数量。
Bioinformatics. 2005 Apr 15;21(8):1509-15. doi: 10.1093/bioinformatics/bti171. Epub 2004 Nov 30.
5
Sample size for FDR-control in microarray data analysis.微阵列数据分析中用于错误发现率控制的样本量。
Bioinformatics. 2005 Jul 15;21(14):3097-104. doi: 10.1093/bioinformatics/bti456. Epub 2005 Apr 21.
6
Sample size calculations based on ranking and selection in microarray experiments.基于微阵列实验中排序与选择的样本量计算。
Biometrics. 2008 Mar;64(1):217-26. doi: 10.1111/j.1541-0420.2007.00875.x. Epub 2007 Aug 3.
7
Reliable gene signatures for microarray classification: assessment of stability and performance.用于微阵列分类的可靠基因特征:稳定性和性能评估
Bioinformatics. 2006 Oct 1;22(19):2356-63. doi: 10.1093/bioinformatics/btl400. Epub 2006 Jul 31.
8
Exploiting sample variability to enhance multivariate analysis of microarray data.利用样本变异性增强微阵列数据的多变量分析。
Bioinformatics. 2007 Oct 15;23(20):2733-40. doi: 10.1093/bioinformatics/btm441. Epub 2007 Sep 7.
9
Partition resampling and extrapolation averaging: approximation methods for quantifying gene expression in large numbers of short oligonucleotide arrays.分区重采样与外推平均法:用于量化大量短寡核苷酸阵列中基因表达的近似方法。
Bioinformatics. 2006 Oct 1;22(19):2364-72. doi: 10.1093/bioinformatics/btl402. Epub 2006 Jul 28.
10
Variance component estimation for mixed model analysis of cDNA microarray data.用于cDNA微阵列数据混合模型分析的方差成分估计
Biom J. 2008 Dec;50(6):927-39. doi: 10.1002/bimj.200810476.

引用本文的文献

1
Analysis of the Relationship Between and Cytokine Gene Expression in Hematological Malignancy: Leveraging Explained Artificial Intelligence and Machine Learning for Small Dataset Insights.血液系统恶性肿瘤中[具体内容]与细胞因子基因表达的关系分析:利用可解释人工智能和机器学习洞察小数据集
Int J Med Sci. 2025 Apr 13;22(9):2208-2226. doi: 10.7150/ijms.109493. eCollection 2025.
2
Predicting Maximal Military Occupational Task Performance from Physical Fitness Tests using Machine Learning.使用机器学习通过体能测试预测最大军事职业任务表现
Med Sci Sports Exerc. 2025 Apr 14;57(9):1877-85. doi: 10.1249/MSS.0000000000003727.
3
Optimizing sample size for supervised machine learning with bulk transcriptomic sequencing: a learning curve approach.
利用批量转录组测序优化监督式机器学习的样本量:一种学习曲线方法。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf097.
4
Optimal Training Positive Sample Size Determination for Deep Learning with a Validation on CBCT Image Caries Recognition.基于CBCT图像龋齿识别验证的深度学习最优训练正样本大小确定
Diagnostics (Basel). 2024 Sep 20;14(18):2080. doi: 10.3390/diagnostics14182080.
5
Optimizing Sample Size for Supervised Machine Learning with Bulk Transcriptomic Sequencing: A Learning Curve Approach.利用批量转录组测序优化监督式机器学习的样本量:一种学习曲线方法。
ArXiv. 2024 Sep 10:arXiv:2409.06180v1.
6
Revisiting Concurrent Radiation Therapy, Temozolomide, and the Histone Deacetylase Inhibitor Valproic Acid for Patients with Glioblastoma-Proteomic Alteration and Comparison Analysis with the Standard-of-Care Chemoirradiation.重新审视替莫唑胺联合放射治疗和组蛋白去乙酰化酶抑制剂丙戊酸治疗胶质母细胞瘤的疗效——基于蛋白组学改变的患者,并与标准放化疗进行比较分析。
Biomolecules. 2023 Oct 10;13(10):1499. doi: 10.3390/biom13101499.
7
Identification of Exo-miRNAs: A Summary of the Efforts in Translational Studies Involving Triple-Negative Breast Cancer.外泌体 miRNA 的鉴定:三阴性乳腺癌相关转化研究的综述。
Cells. 2023 May 7;12(9):1339. doi: 10.3390/cells12091339.
8
Rationale and design of the brain magnetic resonance imaging protocol for FutureMS: a longitudinal multi-centre study of newly diagnosed patients with relapsing-remitting multiple sclerosis in Scotland.“未来多发性硬化症(FutureMS)脑磁共振成像方案的基本原理与设计:苏格兰新诊断复发缓解型多发性硬化症患者的纵向多中心研究”
Wellcome Open Res. 2022 Mar 16;7:94. doi: 10.12688/wellcomeopenres.17731.1. eCollection 2022.
9
Prediction of transition to psychosis from an at-risk mental state using structural neuroimaging, genetic, and environmental data.利用结构神经影像学、基因和环境数据预测处于精神病风险状态者向精神病的转变。
Front Psychiatry. 2023 Jan 19;13:1086038. doi: 10.3389/fpsyt.2022.1086038. eCollection 2022.
10
Autoencoders for sample size estimation for fully connected neural network classifiers.用于全连接神经网络分类器样本量估计的自动编码器
NPJ Digit Med. 2022 Dec 13;5(1):180. doi: 10.1038/s41746-022-00728-0.