基于样本量利用DNA微阵列数据预测临床剂量时确定集成树的截断点

Determining Cutoff Point of Ensemble Trees Based on Sample Size in Predicting Clinical Dose with DNA Microarray Data.

作者信息

Yılmaz Isıkhan Selen, Karabulut Erdem, Alpar Celal Reha

机构信息

Vocational School of Social Sciences, Hacettepe University, Ankara, Turkey; Department of Biostatistics, Faculty of Medicine, Hacettepe University, Ankara, Turkey.

Department of Biostatistics, Faculty of Medicine, Hacettepe University, Ankara, Turkey.

出版信息

Comput Math Methods Med. 2016;2016:6794916. doi: 10.1155/2016/6794916. Epub 2016 Dec 20.

DOI:10.1155/2016/6794916

PMID:28096893

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5206477/

Abstract

. Evaluating the success of dose prediction based on genetic or clinical data has substantially advanced recently. The aim of this study is to predict various clinical dose values from DNA gene expression datasets using data mining techniques. . Eleven real gene expression datasets containing dose values were included. First, important genes for dose prediction were selected using iterative sure independence screening. Then, the performances of regression trees (RTs), support vector regression (SVR), RT bagging, SVR bagging, and RT boosting were examined. . The results demonstrated that a regression-based feature selection method substantially reduced the number of irrelevant genes from raw datasets. Overall, the best prediction performance in nine of 11 datasets was achieved using SVR; the second most accurate performance was provided using a gradient-boosting machine (GBM). . Analysis of various dose values based on microarray gene expression data identified common genes found in our study and the referenced studies. According to our findings, SVR and GBM can be good predictors of dose-gene datasets. Another result of the study was to identify the sample size of = 25 as a cutoff point for RT bagging to outperform a single RT.

摘要

近年来，基于基因或临床数据评估剂量预测的成功率有了显著进展。本研究的目的是使用数据挖掘技术从DNA基因表达数据集中预测各种临床剂量值。纳入了11个包含剂量值的真实基因表达数据集。首先，使用迭代确定独立筛选法选择用于剂量预测的重要基因。然后，检验了回归树（RT）、支持向量回归（SVR）、RT装袋法、SVR装袋法和RT增强法的性能。结果表明，基于回归的特征选择方法显著减少了原始数据集中不相关基因的数量。总体而言，11个数据集中有9个使用SVR实现了最佳预测性能；第二准确的性能由梯度增强机（GBM）提供。基于微阵列基因表达数据对各种剂量值的分析确定了在我们的研究和参考文献中发现的共同基因。根据我们的研究结果，SVR和GBM可以很好地预测剂量-基因数据集。该研究的另一个结果是确定样本量n = 25作为RT装袋法优于单个RT的截止点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7127/5206477/bc1c366100ee/CMMM2016-6794916.001.jpg

相似文献

Determining Cutoff Point of Ensemble Trees Based on Sample Size in Predicting Clinical Dose with DNA Microarray Data.基于样本量利用DNA微阵列数据预测临床剂量时确定集成树的截断点

Comput Math Methods Med. 2016;2016:6794916. doi: 10.1155/2016/6794916. Epub 2016 Dec 20.

Ensemble Feature Learning of Genomic Data Using Support Vector Machine.使用支持向量机的基因组数据集成特征学习

PLoS One. 2016 Jun 15;11(6):e0157330. doi: 10.1371/journal.pone.0157330. eCollection 2016.

Quadratic regression analysis for gene discovery and pattern recognition for non-cyclic short time-course microarray experiments.用于非循环短时间进程微阵列实验的基因发现和模式识别的二次回归分析。

BMC Bioinformatics. 2005 Apr 25;6:106. doi: 10.1186/1471-2105-6-106.

Stable gene selection from microarray data via sample weighting.基于样本加权的基因芯片数据中稳定基因的选择。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jan-Feb;9(1):262-72. doi: 10.1109/TCBB.2011.47. Epub 2011 Mar 3.

The feature selection bias problem in relation to high-dimensional gene data.与高维基因数据相关的特征选择偏差问题。

Artif Intell Med. 2016 Jan;66:63-71. doi: 10.1016/j.artmed.2015.11.001. Epub 2015 Nov 14.

Genetic programming based ensemble system for microarray data classification.基于遗传编程的微阵列数据分类集成系统。

Comput Math Methods Med. 2015;2015:193406. doi: 10.1155/2015/193406. Epub 2015 Feb 25.

Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data.高维小样本情况下的惩罚Cox回归分析及其在微阵列基因表达数据中的应用

Bioinformatics. 2005 Jul 1;21(13):3001-8. doi: 10.1093/bioinformatics/bti422. Epub 2005 Apr 6.

A GMM-IG framework for selecting genes as expression panel biomarkers.一种用于选择基因作为表达谱生物标志物的 GMM-IG 框架。

Artif Intell Med. 2010 Feb-Mar;48(2-3):75-82. doi: 10.1016/j.artmed.2009.07.006. Epub 2009 Dec 8.

Significance of gene ranking for classification of microarray samples.基因排序在微阵列样本分类中的意义。

IEEE/ACM Trans Comput Biol Bioinform. 2006 Jul-Sep;3(3):312-20. doi: 10.1109/TCBB.2006.42.

Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.基于最大间隔准则的递归基因选择：与支持向量机递归特征消除法的比较

BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.

引用本文的文献

Machine Learning: An Overview and Applications in Pharmacogenetics.机器学习：概述及其在药物遗传学中的应用。

Genes (Basel). 2021 Sep 26;12(10):1511. doi: 10.3390/genes12101511.

本文引用的文献

DISIS: prediction of drug response through an iterative sure independence screening.DISIS：通过迭代确定独立筛选预测药物反应。

PLoS One. 2015 Mar 20;10(3):e0120408. doi: 10.1371/journal.pone.0120408. eCollection 2015.

Regression tree construction by bootstrap: model search for DRG-systems applied to Austrian health-data.引导式回归树构建：应用于奥地利健康数据的 DRG 系统的模型搜索。

BMC Med Inform Decis Mak. 2010 Feb 3;10:9. doi: 10.1186/1472-6947-10-9.

Comparing artificial neural networks, general linear models and support vector machines in building predictive models for small interfering RNAs.比较人工神经网络、广义线性模型和支持向量机在构建小干扰 RNA 预测模型中的应用。

PLoS One. 2009 Oct 22;4(10):e7522. doi: 10.1371/journal.pone.0007522.

A saturated fatty acid-rich diet induces an obesity-linked proinflammatory gene expression profile in adipose tissue of subjects at risk of metabolic syndrome.富含饱和脂肪酸的饮食会在有代谢综合征风险的受试者的脂肪组织中诱导出与肥胖相关的促炎基因表达谱。

Am J Clin Nutr. 2009 Dec;90(6):1656-64. doi: 10.3945/ajcn.2009.27792. Epub 2009 Oct 14.

Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.《超高维特征空间中的确定独立性筛选》讨论

J R Stat Soc Series B Stat Methodol. 2008 Nov;70(5):903. doi: 10.1111/j.1467-9868.2008.00674.x.

Physiological and toxicological transcriptome changes in HepG2 cells exposed to copper.铜暴露对 HepG2 细胞的生理和毒理转录组变化

Physiol Genomics. 2009 Aug 7;38(3):386-401. doi: 10.1152/physiolgenomics.00083.2009. Epub 2009 Jun 23.

Evaluating microarray-based classifiers: an overview.评估基于微阵列的分类器：综述。

Cancer Inform. 2008;6:77-97. doi: 10.4137/cin.s408. Epub 2008 Feb 29.

Estimation of the warfarin dose with clinical and pharmacogenetic data.利用临床和药物遗传学数据估算华法林剂量。

N Engl J Med. 2009 Feb 19;360(8):753-64. doi: 10.1056/NEJMoa0809329.

Cluster analysis of rat pancreatic islet gene mRNA levels after culture in low-, intermediate- and high-glucose concentrations.大鼠胰岛在低、中和高葡萄糖浓度下培养后基因mRNA水平的聚类分析。

Diabetologia. 2009 Mar;52(3):463-76. doi: 10.1007/s00125-008-1245-z. Epub 2009 Jan 23.

Transcriptional response of rat frontal cortex following acute in vivo exposure to the pyrethroid insecticides permethrin and deltamethrin.大鼠额叶皮质在体内急性暴露于拟除虫菊酯类杀虫剂氯菊酯和溴氰菊酯后的转录反应。

BMC Genomics. 2008 Nov 18;9:546. doi: 10.1186/1471-2164-9-546.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于样本量利用DNA微阵列数据预测临床剂量时确定集成树的截断点

Determining Cutoff Point of Ensemble Trees Based on Sample Size in Predicting Clinical Dose with DNA Microarray Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献