Suppr超能文献

医学统计分析与数据挖掘中测量尺度的处理:方法综述

Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies.

作者信息

Marateb Hamid Reza, Mansourian Marjan, Adibi Peyman, Farina Dario

机构信息

Department of Biomedical Engineering, Engineering Faculty, the University of Isfahan, Isfahan, Iran.

Department of Biostatistics and Epidemiology, Health School, Isfahan University of Medical Sciences, Isfahan, Iran.

出版信息

J Res Med Sci. 2014 Jan;19(1):47-56.

Abstract

BACKGROUND

selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal-variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD).

ORDINAL-TO-INTERVAL SCALE CONVERSION EXAMPLE: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests.

RESULTS

the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable.

CONCLUSION

by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables.

摘要

背景

选择正确的统计检验和数据挖掘方法在很大程度上取决于数据的测量尺度、变量类型和分析目的。本文详细研究了不同的测量尺度,并基于几个医学实例研究了统计比较、建模和数据挖掘方法。我们使用威斯康星乳腺癌数据(WBCD)给出了两个有序变量聚类实例,作为分析中更具挑战性的变量。

有序尺度到区间尺度转换示例

采用两种有序尺度聚类方法对一个包含683例患者的、有9个10级有序变量的乳腺癌数据库进行分析。通过与经临床检验确定的恶性和良性病例的金标准组进行比较,评估聚类方法的性能。

结果

两种聚类方法的灵敏度和准确率分别为98%和96%。它们的特异性相当。

结论

通过基于研究中变量的测量尺度使用适当的聚类算法,可以获得高性能。此外,描述性和推断性统计以及建模方法都必须根据变量的尺度来选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1681/3963323/8381130eb1ae/JRMS-19-47-g003.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验