Suppr超能文献

基于微阵列数据,使用支持向量机对11种神经肌肉疾病进行多类别分类。

Multicategory classification of 11 neuromuscular diseases based on microarray data using support vector machine.

作者信息

Choi Soo Beom, Park Jee Soo, Chung Jai Won, Yoo Tae Keun, Kim Deok Won

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2014;2014:3460-3. doi: 10.1109/EMBC.2014.6944367.

Abstract

We applied multicategory machine learning methods to classify 11 neuromuscular disease groups and one control group based on microarray data. To develop multicategory classification models with optimal parameters and features, we performed a systematic evaluation of three machine learning algorithms and four feature selection methods using three-fold cross validation and a grid search. This study included 114 subjects of 11 neuromuscular diseases and 31 subjects of a control group using microarray data with 22,283 probe sets from the National Center for Biotechnology Information (NCBI). We obtained an accuracy of 100%, relative classifier information (RCI) of 1.0, and a kappa index of 1.0 by applying the models of support vector machines one-versus-one (SVM-OVO), SVM one-versus-rest (OVR), and directed acyclic graph SVM (DAGSVM), using the ratio of genes between categories to within-category sums of squares (BW) feature selection method. Each of these three models selected only four features to categorize the 12 groups, resulting in a time-saving and cost-effective strategy for diagnosing neuromuscular diseases. In addition, a gene symbol, SPP1 was selected as the top-ranked gene by the BW method. We confirmed relationships between the gene (SPP1) and Duchenne muscular dystrophy (DMD) from a previous study. With our models as clinically helpful tools, neuromuscular diseases could be classified quickly using a computer, thereby giving a time-saving, cost-effective, and accurate diagnosis.

摘要

我们应用多类别机器学习方法,基于微阵列数据对11个神经肌肉疾病组和1个对照组进行分类。为了开发具有最佳参数和特征的多类别分类模型,我们使用三折交叉验证和网格搜索对三种机器学习算法和四种特征选择方法进行了系统评估。本研究纳入了114名患有11种神经肌肉疾病的受试者和31名对照组受试者,使用了来自美国国立生物技术信息中心(NCBI)的包含22283个探针集的微阵列数据。通过应用一对一支持向量机(SVM-OVO)、一对其余支持向量机(OVR)和有向无环图支持向量机(DAGSVM)模型,并使用类别间基因比例与类别内平方和(BW)特征选择方法,我们获得了100%的准确率、1.0的相对分类器信息(RCI)和1.0的kappa指数。这三种模型中的每一种仅选择四个特征来对12个组进行分类,从而形成了一种省时且经济高效的神经肌肉疾病诊断策略。此外,基因符号SPP1被BW方法选为排名最高的基因。我们从先前的研究中证实了该基因(SPP1)与杜氏肌营养不良症(DMD)之间的关系。借助我们的模型作为临床有用工具,神经肌肉疾病可以通过计算机快速分类,从而实现省时、经济高效且准确的诊断。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验