Suppr超能文献

大型微阵列数据集的分类:算法比较与药物特征分析

Classification of a large microarray data set: algorithm comparison and analysis of drug signatures.

作者信息

Natsoulis Georges, El Ghaoui Laurent, Lanckriet Gert R G, Tolley Alexander M, Leroy Fabrice, Dunlea Shane, Eynon Barrett P, Pearson Cecelia I, Tugendreich Stuart, Jarnagin Kurt

机构信息

Iconix Pharmaceuticals, Mountain View, CA 94043, USA.

出版信息

Genome Res. 2005 May;15(5):724-36. doi: 10.1101/gr.2807605.

Abstract

A large gene expression database has been produced that characterizes the gene expression and physiological effects of hundreds of approved and withdrawn drugs, toxicants, and biochemical standards in various organs of live rats. In order to derive useful biological knowledge from this large database, a variety of supervised classification algorithms were compared using a 597-microarray subset of the data. Our studies show that several types of linear classifiers based on Support Vector Machines (SVMs) and Logistic Regression can be used to derive readily interpretable drug signatures with high classification performance. Both methods can be tuned to produce classifiers of drug treatments in the form of short, weighted gene lists which upon analysis reveal that some of the signature genes have a positive contribution (act as "rewards" for the class-of-interest) while others have a negative contribution (act as "penalties") to the classification decision. The combination of reward and penalty genes enhances performance by keeping the number of false positive treatments low. The results of these algorithms are combined with feature selection techniques that further reduce the length of the drug signatures, an important step towards the development of useful diagnostic biomarkers and low-cost assays. Multiple signatures with no genes in common can be generated for the same classification end-point. Comparison of these gene lists identifies biological processes characteristic of a given class.

摘要

已经建立了一个大型基因表达数据库,该数据库描述了数百种已批准和撤回的药物、毒物以及生化标准品在活大鼠各个器官中的基因表达和生理效应。为了从这个大型数据库中获取有用的生物学知识,使用该数据的一个597个微阵列子集比较了多种监督分类算法。我们的研究表明,基于支持向量机(SVM)和逻辑回归的几种线性分类器可用于得出具有高分类性能且易于解释的药物特征。这两种方法都可以进行调整,以生成药物治疗分类器,其形式为简短的加权基因列表,经分析发现,一些特征基因具有正向贡献(作为目标类别的“奖励”),而其他基因对分类决策具有负向贡献(作为“惩罚”)。奖励基因和惩罚基因的组合通过保持低假阳性治疗数量来提高性能。这些算法的结果与特征选择技术相结合,进一步缩短了药物特征的长度,这是朝着开发有用的诊断生物标志物和低成本检测方法迈出的重要一步。对于相同的分类终点,可以生成多个没有共同基因的特征。比较这些基因列表可识别给定类别的生物学过程特征。

相似文献

7
Gene selection using support vector machines with non-convex penalty.使用具有非凸惩罚项的支持向量机进行基因选择。
Bioinformatics. 2006 Jan 1;22(1):88-95. doi: 10.1093/bioinformatics/bti736. Epub 2005 Oct 25.

引用本文的文献

1
Integrative omics - An arsenal for drug discovery.整合组学——药物发现的武器库。
Indian J Pharmacol. 2022 Jan-Feb;54(1):1-6. doi: 10.4103/ijp.ijp_53_22.
2
An omics perspective on drug target discovery platforms.从组学角度看药物靶点发现平台。
Brief Bioinform. 2020 Dec 1;21(6):1937-1953. doi: 10.1093/bib/bbz122.
3
CEBS: a comprehensive annotated database of toxicological data.CEBS:一个全面的毒理学数据注释数据库。
Nucleic Acids Res. 2017 Jan 4;45(D1):D964-D971. doi: 10.1093/nar/gkw1077. Epub 2016 Nov 28.
8
Performance reproducibility index for classification.分类性能再现性指数。
Bioinformatics. 2012 Nov 1;28(21):2824-33. doi: 10.1093/bioinformatics/bts509. Epub 2012 Sep 6.

本文引用的文献

4
Multiclass cancer diagnosis using tumor gene expression signatures.利用肿瘤基因表达特征进行多类癌症诊断。
Proc Natl Acad Sci U S A. 2001 Dec 18;98(26):15149-54. doi: 10.1073/pnas.211566398. Epub 2001 Dec 11.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验