Sundaresh Suman, Randall Arlo, Unal Berkay, Petersen Jeannine M, Belisle John T, Hartley M Gill, Duffield Melanie, Titball Richard W, Davies D Huw, Felgner Philip L, Baldi Pierre
School of Information and Computer Sciences, University of California, Irvine, CA, USA.
Bioinformatics. 2007 Jul 1;23(13):i508-18. doi: 10.1093/bioinformatics/btm207.
An important application of protein microarray data analysis is identifying a serodiagnostic antigen set that can reliably detect patterns and classify antigen expression profiles. This work addresses this problem using antibody responses to protein markers measured by a novel high-throughput microarray technology. The findings from this study have direct relevance to rapid, broad-based diagnostic and vaccine development.
Protein microarray chips are probed with sera from individuals infected with the bacteria Francisella tularensis, a category A biodefense pathogen. A two-step approach to the diagnostic process is presented (1) feature (antigen) selection and (2) classification using antigen response measurements obtained from F.tularensis microarrays (244 antigens, 46 infected and 54 healthy human sera measurements). To select antigens, a ranking scheme based on the identification of significant immune responses and differential expression analysis is described. Classification methods including k-nearest neighbors, support vector machines (SVM) and k-Means clustering are applied to training data using selected antigen sets of various sizes. SVM based models yield prediction accuracy rates in the range of approximately 90% on validation data, when antigen set sizes are between 25 and 50. These results strongly indicate that the top-ranked antigens can be considered high-priority candidates for diagnostic development.
All software programs are written in R and available at http://www.igb.uci.edu/index.php?page=tools and at http://www.r-project.org.
Supplementary data are available at Bioinformatics online.
蛋白质微阵列数据分析的一个重要应用是识别一组血清诊断抗原,该抗原组能够可靠地检测模式并对抗原表达谱进行分类。这项工作使用一种新型高通量微阵列技术测量的针对蛋白质标志物的抗体反应来解决这一问题。本研究的结果与快速、广泛的诊断和疫苗开发直接相关。
用感染土拉弗朗西斯菌(一种A类生物防御病原体)的个体的血清探测蛋白质微阵列芯片。提出了一种两步诊断方法:(1)特征(抗原)选择,(2)使用从土拉弗朗西斯菌微阵列(244种抗原,46份感染人类血清测量值和54份健康人类血清测量值)获得的抗原反应测量值进行分类。为了选择抗原,描述了一种基于显著免疫反应识别和差异表达分析的排序方案。包括k近邻、支持向量机(SVM)和k均值聚类在内的分类方法应用于使用各种大小的选定抗原集的训练数据。当抗原集大小在25到50之间时,基于SVM的模型在验证数据上的预测准确率约为90%。这些结果有力地表明,排名靠前的抗原可被视为诊断开发的高优先级候选抗原。
所有软件程序均用R编写,可在http://www.igb.uci.edu/index.php?page=tools和http://www.r-project.org获取。
补充数据可在《生物信息学》在线获取。