Suppr超能文献

免疫特征数据分类算法的比较研究。

Comparative study of classification algorithms for immunosignaturing data.

机构信息

Center for Innovations in Medicine, Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA.

出版信息

BMC Bioinformatics. 2012 Jun 21;13:139. doi: 10.1186/1471-2105-13-139.

Abstract

BACKGROUND

High-throughput technologies such as DNA, RNA, protein, antibody and peptide microarrays are often used to examine differences across drug treatments, diseases, transgenic animals, and others. Typically one trains a classification system by gathering large amounts of probe-level data, selecting informative features, and classifies test samples using a small number of features. As new microarrays are invented, classification systems that worked well for other array types may not be ideal. Expression microarrays, arguably one of the most prevalent array types, have been used for years to help develop classification algorithms. Many biological assumptions are built into classifiers that were designed for these types of data. One of the more problematic is the assumption of independence, both at the probe level and again at the biological level. Probes for RNA transcripts are designed to bind single transcripts. At the biological level, many genes have dependencies across transcriptional pathways where co-regulation of transcriptional units may make many genes appear as being completely dependent. Thus, algorithms that perform well for gene expression data may not be suitable when other technologies with different binding characteristics exist. The immunosignaturing microarray is based on complex mixtures of antibodies binding to arrays of random sequence peptides. It relies on many-to-many binding of antibodies to the random sequence peptides. Each peptide can bind multiple antibodies and each antibody can bind multiple peptides. This technology has been shown to be highly reproducible and appears promising for diagnosing a variety of disease states. However, it is not clear what is the optimal classification algorithm for analyzing this new type of data.

RESULTS

We characterized several classification algorithms to analyze immunosignaturing data. We selected several datasets that range from easy to difficult to classify, from simple monoclonal binding to complex binding patterns in asthma patients. We then classified the biological samples using 17 different classification algorithms. Using a wide variety of assessment criteria, we found 'Naïve Bayes' far more useful than other widely used methods due to its simplicity, robustness, speed and accuracy.

CONCLUSIONS

'Naïve Bayes' algorithm appears to accommodate the complex patterns hidden within multilayered immunosignaturing microarray data due to its fundamental mathematical properties.

摘要

背景

高通量技术,如 DNA、RNA、蛋白质、抗体和肽微阵列,常用于检测药物治疗、疾病、转基因动物等方面的差异。通常,通过收集大量探针级别的数据,选择有意义的特征,然后使用少量特征对测试样本进行分类,从而训练分类系统。随着新的微阵列的发明,适用于其他阵列类型的分类系统可能并不理想。表达微阵列可以说是最常见的阵列类型之一,多年来一直被用于帮助开发分类算法。许多生物学假设被内置到为这些类型的数据设计的分类器中。其中一个更成问题的假设是在探针水平和生物水平上的独立性假设。RNA 转录物的探针设计用于结合单个转录物。在生物学水平上,许多基因在转录途径中存在依赖性,转录单元的共调控可能使许多基因看起来完全依赖。因此,在存在具有不同结合特性的其他技术时,适用于基因表达数据的算法可能并不适合。免疫印迹微阵列基于与随机序列肽阵列结合的复杂抗体混合物。它依赖于抗体与随机序列肽的多对多结合。每个肽可以结合多个抗体,每个抗体可以结合多个肽。该技术已被证明具有高度可重复性,并且在诊断各种疾病状态方面显示出很大的希望。然而,对于分析这种新型数据,哪种分类算法是最优的还不清楚。

结果

我们对几种分类算法进行了分析免疫印迹数据的特征描述。我们选择了几个数据集,范围从简单的单克隆结合到哮喘患者的复杂结合模式,这些数据集从易于分类到难以分类。然后,我们使用 17 种不同的分类算法对这些生物样本进行了分类。使用广泛的评估标准,我们发现“朴素贝叶斯”由于其简单性、稳健性、速度和准确性,远比其他广泛使用的方法更有用。

结论

“朴素贝叶斯”算法似乎由于其基本的数学性质而适应于多层免疫印迹微阵列数据中隐藏的复杂模式。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验