Suppr超能文献

囊性纤维化的一种新评分系统:用于数据库分析的统计工具——初步报告

A new scoring system in Cystic Fibrosis: statistical tools for database analysis - a preliminary report.

作者信息

Hafen G M, Hurst C, Yearwood J, Smith J, Dzalilov Z, Robinson P J

机构信息

Department of Respiratory Medicine, Royal Children's Hospital Melbourne, Parkville, Victoria, Australia.

出版信息

BMC Med Inform Decis Mak. 2008 Oct 5;8:44. doi: 10.1186/1472-6947-8-44.

Abstract

BACKGROUND

Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21st century. The aim of this current project is to develop a new scoring system using a database and employing various statistical tools. This study protocol reports the development of the statistical tools in order to create such a scoring system.

METHODS

The evaluation is based on the Cystic Fibrosis database from the cohort at the Royal Children's Hospital in Melbourne. Initially, unsupervised clustering of the all data records was performed using a range of clustering algorithms. In particular incremental clustering algorithms were used. The clusters obtained were characterised using rules from decision trees and the results examined by clinicians. In order to obtain a clearer definition of classes expert opinion of each individual's clinical severity was sought. After data preparation including expert-opinion of an individual's clinical severity on a 3 point-scale (mild, moderate and severe disease), two multivariate techniques were used throughout the analysis to establish a method that would have a better success in feature selection and model derivation: 'Canonical Analysis of Principal Coordinates' and 'Linear Discriminant Analysis'. A 3-step procedure was performed with (1) selection of features, (2) extracting 5 severity classes out of a 3 severity class as defined per expert-opinion and (3) establishment of calibration datasets.

RESULTS

(1) Feature selection: CAP has a more effective "modelling" focus than DA.(2) Extraction of 5 severity classes: after variables were identified as important in discriminating contiguous CF severity groups on the 3-point scale as mild/moderate and moderate/severe, Discriminant Function (DF) was used to determine the new groups mild, intermediate moderate, moderate, intermediate severe and severe disease. (3) Generated confusion tables showed a misclassification rate of 19.1% for males and 16.5% for females, with a majority of misallocations into adjacent severity classes particularly for males.

CONCLUSION

Our preliminary data show that using CAP for detection of selection features and Linear DA to derive the actual model in a CF database might be helpful in developing a scoring system. However, there are several limitations, particularly more data entry points are needed to finalize a score and the statistical tools have further to be refined and validated, with re-running the statistical methods in the larger dataset.

摘要

背景

囊性纤维化是白种人群中最常见的致命性遗传疾病。评估囊性纤维化疾病严重程度的评分系统已使用了近50年,却未适应21世纪该疾病更温和的表型。本项目的目的是利用数据库并运用各种统计工具开发一种新的评分系统。本研究方案报告了为创建这样一个评分系统而开发的统计工具。

方法

评估基于墨尔本皇家儿童医院队列的囊性纤维化数据库。最初,使用一系列聚类算法对所有数据记录进行无监督聚类。特别使用了增量聚类算法。通过决策树规则对获得的聚类进行特征描述,并由临床医生检查结果。为了更清晰地定义类别,征求了每位个体临床严重程度的专家意见。在进行包括基于3分制(轻度、中度和重度疾病)的个体临床严重程度专家意见的数据准备后,在整个分析过程中使用了两种多元技术来建立一种在特征选择和模型推导方面更成功的方法:“主坐标典型分析”和“线性判别分析”。执行了一个3步骤程序,包括(1)特征选择,(2)根据专家意见从3个严重程度类别中提取5个严重程度类别,以及(3)建立校准数据集。

结果

(1)特征选择:主坐标典型分析(CAP)比判别分析(DA)具有更有效的“建模”重点。(2)提取5个严重程度类别:在将变量确定为在区分3分制上相邻的囊性纤维化严重程度组(轻度/中度和中度/重度)中重要之后,使用判别函数(DF)来确定新的组,即轻度、中度中间型、中度、重度中间型和重度疾病。(3)生成的混淆表显示,男性的错误分类率为19.1%,女性为16.5%,大多数错误分类进入相邻的严重程度类别,尤其是男性。

结论

我们的初步数据表明,在囊性纤维化数据库中使用主坐标典型分析(CAP)进行选择特征检测,并使用线性判别分析(Linear DA)推导实际模型,可能有助于开发一种评分系统。然而,存在一些局限性,特别是需要更多的数据输入点来确定最终得分,并且统计工具还需要进一步完善和验证,需要在更大的数据集中重新运行统计方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a77/2580762/afb5e6fafa45/1472-6947-8-44-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验