Yılmaz Isıkhan Selen, Karabulut Erdem, Alpar Celal Reha
Vocational School of Social Sciences, Hacettepe University, Ankara, Turkey; Department of Biostatistics, Faculty of Medicine, Hacettepe University, Ankara, Turkey.
Department of Biostatistics, Faculty of Medicine, Hacettepe University, Ankara, Turkey.
Comput Math Methods Med. 2016;2016:6794916. doi: 10.1155/2016/6794916. Epub 2016 Dec 20.
. Evaluating the success of dose prediction based on genetic or clinical data has substantially advanced recently. The aim of this study is to predict various clinical dose values from DNA gene expression datasets using data mining techniques. . Eleven real gene expression datasets containing dose values were included. First, important genes for dose prediction were selected using iterative sure independence screening. Then, the performances of regression trees (RTs), support vector regression (SVR), RT bagging, SVR bagging, and RT boosting were examined. . The results demonstrated that a regression-based feature selection method substantially reduced the number of irrelevant genes from raw datasets. Overall, the best prediction performance in nine of 11 datasets was achieved using SVR; the second most accurate performance was provided using a gradient-boosting machine (GBM). . Analysis of various dose values based on microarray gene expression data identified common genes found in our study and the referenced studies. According to our findings, SVR and GBM can be good predictors of dose-gene datasets. Another result of the study was to identify the sample size of = 25 as a cutoff point for RT bagging to outperform a single RT.
近年来,基于基因或临床数据评估剂量预测的成功率有了显著进展。本研究的目的是使用数据挖掘技术从DNA基因表达数据集中预测各种临床剂量值。纳入了11个包含剂量值的真实基因表达数据集。首先,使用迭代确定独立筛选法选择用于剂量预测的重要基因。然后,检验了回归树(RT)、支持向量回归(SVR)、RT装袋法、SVR装袋法和RT增强法的性能。结果表明,基于回归的特征选择方法显著减少了原始数据集中不相关基因的数量。总体而言,11个数据集中有9个使用SVR实现了最佳预测性能;第二准确的性能由梯度增强机(GBM)提供。基于微阵列基因表达数据对各种剂量值的分析确定了在我们的研究和参考文献中发现的共同基因。根据我们的研究结果,SVR和GBM可以很好地预测剂量-基因数据集。该研究的另一个结果是确定样本量n = 25作为RT装袋法优于单个RT的截止点。