Suppr超能文献

SigMat:一种基因特征匹配的分类方案。

SigMat: a classification scheme for gene signature matching.

机构信息

Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA.

Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, USA.

出版信息

Bioinformatics. 2018 Jul 1;34(13):i547-i554. doi: 10.1093/bioinformatics/bty251.

Abstract

MOTIVATION

Several large-scale efforts have been made to collect gene expression signatures from a variety of biological conditions, such as response of cell lines to treatment with drugs, or tumor samples with different characteristics. These gene signature collections are utilized through bioinformatics tools for 'signature matching', whereby a researcher studying an expression profile can identify previously cataloged biological conditions most related to their profile. Signature matching tools typically retrieve from the collection the signature that has highest similarity to the user-provided profile. Alternatively, classification models may be applied where each biological condition in the signature collection is a class label; however, such models are trained on the collection of available signatures and may not generalize to the novel cellular context or cell line of the researcher's expression profile.

RESULTS

We present an advanced multi-way classification algorithm for signature matching, called SigMat, that is trained on a large signature collection from a well-studied cellular context, but can also classify signatures from other cell types by relying on an additional, small collection of signatures representing the target cell type. It uses these 'tuning data' to learn two additional parameters that help adapt its predictions for other cellular contexts. SigMat outperforms other similarity scores and classification methods in identifying the correct label of a query expression profile from as many as 244 or 500 candidate classes (drug treatments) cataloged by the LINCS L1000 project. SigMat retains its high accuracy in cross-cell line applications even when the amount of tuning data is severely limited.

AVAILABILITY AND IMPLEMENTATION

SigMat is available on GitHub at https://github.com/JinfengXiao/SigMat.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

已经有几项大规模的研究工作致力于从各种生物学条件中收集基因表达特征,例如细胞系对药物处理的反应,或具有不同特征的肿瘤样本。这些基因特征集通过生物信息学工具用于“特征匹配”,研究人员可以通过这种方法识别与其特征最相关的先前编目的生物学条件。特征匹配工具通常从集合中检索与用户提供的特征最相似的特征。或者,可以应用分类模型,其中特征集合中的每个生物学条件都是一个类别标签;但是,这些模型是在可用特征集合上进行训练的,可能无法推广到研究人员表达特征的新细胞环境或细胞系。

结果

我们提出了一种称为 SigMat 的高级多向分类算法,用于特征匹配,它是在一个经过深入研究的细胞环境的大型特征集合上进行训练的,但也可以通过依赖于代表目标细胞类型的额外小特征集合来对其他细胞类型的特征进行分类。它使用这些“调整数据”来学习另外两个参数,以帮助其为其他细胞环境调整预测。SigMat 在识别 L1000 项目编目多达 244 或 500 个候选类别(药物治疗)的查询表达特征的正确标签方面优于其他相似度得分和分类方法。即使调整数据量非常有限,SigMat 在跨细胞系应用中也能保持高准确性。

可用性和实现

SigMat 可在 GitHub 上获得,网址为 https://github.com/JinfengXiao/SigMat。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3335/6022536/e0e96c38d2bb/bty251f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验