• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

全基因组筛选中显著基因组特征的内部验证推断

Internal validation inferences of significant genomic features in genome-wide screening.

作者信息

Cheng Cheng

机构信息

Department of Biostatistics, St. Jude Children's Research Hospital, 332 N. Lauderdale Street, Memphis, TN 38105-2794.

出版信息

Comput Stat Data Anal. 2009 Jan 15;53(3):788-800. doi: 10.1016/j.csda.2008.07.004.

DOI:10.1016/j.csda.2008.07.004
PMID:20084293
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2805177/
Abstract

Although validation of classification and prediction models has been a long-standing topic in Statistics and computer learning, the concept of statistical validation in genome-wide screening studies has been vague. Internal validation generally refers to validation procedures solely based on the study dataset. A popular approach to internal validation of identified genomic features has been the split-dataset validation. Contrast to this approach, internal validation in genome-wide association screening studies is precisely defined through the concepts of association profile and profile significance. A general procedure and two specific profile significance measures are developed and are compared with the split-dataset validation approach by a simulation study. The simulation results clearly demonstrate the strength and limitations of the profile significance approach to internal validation, especially its enormous gain in sensitivity (power) and stability over the split-dataset validation. The proposed methodology is illustrated by an example of genome-wide SNP associaiton analysis in genetic epidemiology.

摘要

尽管分类和预测模型的验证在统计学和计算机学习领域一直是个长期话题,但全基因组筛选研究中的统计验证概念却一直模糊不清。内部验证通常指仅基于研究数据集的验证程序。一种常用于已识别基因组特征内部验证的流行方法是数据集拆分验证。与这种方法形成对比的是,全基因组关联筛选研究中的内部验证是通过关联概况和概况显著性的概念来精确界定的。本文开发了一种通用程序和两种特定的概况显著性度量方法,并通过模拟研究将其与数据集拆分验证方法进行比较。模拟结果清楚地展示了概况显著性方法用于内部验证的优势和局限性,尤其是相较于数据集拆分验证,它在灵敏度(功效)和稳定性方面有巨大提升。本文通过遗传流行病学中全基因组单核苷酸多态性关联分析的实例来说明所提出的方法。

相似文献

1
Internal validation inferences of significant genomic features in genome-wide screening.全基因组筛选中显著基因组特征的内部验证推断
Comput Stat Data Anal. 2009 Jan 15;53(3):788-800. doi: 10.1016/j.csda.2008.07.004.
2
A comparison of internal validation techniques for multifactor dimensionality reduction.多因素维度缩减的内部验证技术比较。
BMC Bioinformatics. 2010 Jul 22;11:394. doi: 10.1186/1471-2105-11-394.
3
Application of information theoretic feature selection and machine learning methods for the development of genetic risk prediction models.信息论特征选择和机器学习方法在遗传风险预测模型开发中的应用。
Sci Rep. 2021 Dec 2;11(1):23335. doi: 10.1038/s41598-021-00854-x.
4
Empirical evaluation of internal validation methods for prediction in large-scale clinical data with rare-event outcomes: a case study in suicide risk prediction.大规模临床稀有事件结局数据预测中内部验证方法的实证评估:以自杀风险预测为例
BMC Med Res Methodol. 2023 Feb 1;23(1):33. doi: 10.1186/s12874-023-01844-5.
5
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
6
Development and validation of a diagnostic aid for convulsive epilepsy in sub-Saharan Africa: a retrospective case-control study.撒哈拉以南非洲惊厥性癫痫诊断辅助工具的开发与验证:一项回顾性病例对照研究
Lancet Digit Health. 2023 Apr;5(4):e185-e193. doi: 10.1016/S2589-7500(22)00255-2.
7
ICGRM: integrative construction of genomic relationship matrix combining multiple genomic regions for big dataset.ICGRM:整合多个基因组区域构建基因组关系矩阵的综合方法,用于大数据集。
BMC Bioinformatics. 2019 Dec 26;20(1):731. doi: 10.1186/s12859-019-3319-y.
8
Genomic screening and replication using the same data set in family-based association testing.在基于家系的关联测试中使用相同数据集进行基因组筛查和复制。
Nat Genet. 2005 Jul;37(7):683-91. doi: 10.1038/ng1582. Epub 2005 Jun 5.
9
Smooth-Threshold Multivariate Genetic Prediction with Unbiased Model Selection.具有无偏模型选择的平滑阈值多变量遗传预测
Genet Epidemiol. 2016 Apr;40(3):233-43. doi: 10.1002/gepi.21958. Epub 2016 Mar 6.
10
Predictive value of single-nucleotide polymorphism signature for recurrence in localised renal cell carcinoma: a retrospective analysis and multicentre validation study.单核苷酸多态性特征预测局限性肾细胞癌复发的价值:一项回顾性分析和多中心验证研究。
Lancet Oncol. 2019 Apr;20(4):591-600. doi: 10.1016/S1470-2045(18)30932-X. Epub 2019 Mar 14.

引用本文的文献

1
Evaluation of a two-step iterative resampling procedure for internal validation of genome-wide association studies.用于全基因组关联研究内部验证的两步迭代重采样程序的评估
J Hum Genet. 2015 Dec;60(12):729-38. doi: 10.1038/jhg.2015.110. Epub 2015 Sep 17.
2
A statistical approach to selecting and confirming validation targets in -omics experiments.一种在组学实验中选择和确认验证靶标的统计方法。
BMC Bioinformatics. 2012 Jun 27;13:150. doi: 10.1186/1471-2105-13-150.
3
A Phenotype-Driven Dimension Reduction (PhDDR) approach to integrated genomic association analyses.

本文引用的文献

1
Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants.对四种疾病中的14500个非同义单核苷酸多态性进行关联扫描,发现了自身免疫性变异。
Nat Genet. 2007 Nov;39(11):1329-37. doi: 10.1038/ng.2007.17. Epub 2007 Oct 21.
2
False discovery rate paradigms for statistical analyses of microarray gene expression data.用于微阵列基因表达数据统计分析的错误发现率范式。
Bioinformation. 2007 Apr 10;1(10):436-46. doi: 10.6026/97320630001436.
3
Predicting survival from microarray data--a comparative study.从微阵列数据预测生存率——一项比较研究。
一种用于综合基因组关联分析的表型驱动降维(PhDDR)方法。
Annu Int Conf IEEE Eng Med Biol Soc. 2011;2011:6837-40. doi: 10.1109/IEMBS.2011.6091686.
Bioinformatics. 2007 Aug 15;23(16):2080-7. doi: 10.1093/bioinformatics/btm305. Epub 2007 Jun 6.
4
Robust estimation of the false discovery rate.错误发现率的稳健估计
Bioinformatics. 2006 Aug 15;22(16):1979-87. doi: 10.1093/bioinformatics/btl328. Epub 2006 Jun 15.
5
Statistical significance threshold criteria for analysis of microarray gene expression data.微阵列基因表达数据分析的统计学显著性阈值标准。
Stat Appl Genet Mol Biol. 2004;3:Article36. doi: 10.2202/1544-6115.1064. Epub 2004 Dec 19.
6
Genome-wide approach to identify risk factors for therapy-related myeloid leukemia.全基因组方法用于识别治疗相关髓系白血病的危险因素。
Leukemia. 2006 Feb;20(2):239-46. doi: 10.1038/sj.leu.2404059.
7
Roadmap for developing and validating therapeutically relevant genomic classifiers.开发和验证具有治疗相关性的基因组分类器的路线图。
J Clin Oncol. 2005 Oct 10;23(29):7332-41. doi: 10.1200/JCO.2005.02.8712. Epub 2005 Sep 6.
8
Development and validation of therapeutically relevant multi-gene biomarker classifiers.具有治疗相关性的多基因生物标志物分类器的开发与验证
J Natl Cancer Inst. 2005 Jun 15;97(12):866-7. doi: 10.1093/jnci/dji168.
9
Lymphoid gene expression as a predictor of risk of secondary brain tumors.淋巴细胞基因表达作为继发性脑肿瘤风险的预测指标。
Genes Chromosomes Cancer. 2005 Feb;42(2):107-16. doi: 10.1002/gcc.20121.
10
Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia.利用基因表达谱分析鉴定成人急性髓系白血病的预后亚类。
N Engl J Med. 2004 Apr 15;350(16):1605-16. doi: 10.1056/NEJMoa031046.