Suppr超能文献

在一项伊朗队列研究中应用数据挖掘技术提取有关乳腺癌生存的隐藏模式。

Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study.

作者信息

Khalkhali Hamid Reza, Lotfnezhad Afshar Hadi, Esnaashari Omid, Jabbari Nasrollah

机构信息

Inpatient's Safety Research Center, Department of Biostatistics, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran.

Department of Health Information Technology, School of Paramedicine, Urmia University of Medical Sciences, Urmia, Iran.

出版信息

J Res Health Sci. 2016 Winter;16(1):31-5.

Abstract

BACKGROUND

Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already.

METHODS

The classification and regression trees (CART) was applied to a breast cancer database contained information on 569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity.

RESULTS

The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively.

CONCLUSIONS

The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset.

摘要

背景

许多标准数据挖掘算法已对乳腺癌生存率进行了分析。其中一组算法属于决策树类别。决策树算法在可视化和制定研究变量之间的隐藏模式方面的能力,是在当前尚未研究过的研究中应用决策树类别算法的主要原因。

方法

将分类与回归树(CART)应用于一个乳腺癌数据库,该数据库包含2007 - 2010年569例患者的信息。使用用于分类目标变量的基尼不纯度度量。通过10折交叉验证实验测量作为树大小函数的分类误差。根据准确性、敏感性和特异性等标准评估所创建模型的性能。

结果

CART模型生成了一棵有17个节点的决策树,其中9个与一组规则相关。这些规则在临床上有意义。它们以if - then格式显示,分期是预测乳腺癌生存率的最重要变量。准确性、敏感性和特异性得分分别为:80.3%、93.5%和53%。

结论

当前研究模型作为由CART创建的第一个模型,能够从相对较小规模的数据集中提取有用的隐藏规则。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a4e/7189091/fcc9b9c6132f/jrhs-16-31-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验