Suppr超能文献

基于模型的聚类与数据校正以去除基因表达数据中的伪迹

Model-Based Clustering With Data Correction For Removing Artifacts In Gene Expression Data.

作者信息

Young William Chad, Raftery Adrian E, Yeung Ka Yee

机构信息

Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195.

Institute of Technology, University of Washington Tacoma, Campus Box 358426, 1900 Commerce Street, Tacoma, WA 98402.

出版信息

Ann Appl Stat. 2016 Feb;11(4):1998-2026. doi: 10.1214/17-AOAS1051. Epub 2017 Dec 28.

Abstract

The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometimes inadequate for reliable deconvolution, leading to artifacts in the final processed data. These include the expression levels of paired genes being flipped or given the same value, and clusters of values that are not at the true expression level. We propose a new method called model-based clustering with data correction (MCDC) that is able to identify and correct these three kinds of artifacts simultaneously. We show that MCDC improves the resulting gene expression data in terms of agreement with external baselines, as well as improving results from subsequent analysis.

摘要

美国国立卫生研究院基于综合网络的细胞特征库(LINCS)包含使用Luminex微珠技术进行的超过100万次实验的基因表达数据。仅使用500种颜色来测量所检测的1000个标志性基因的表达水平,并且对所得基因对的数据进行反卷积处理。原始数据有时不足以进行可靠的反卷积,从而导致最终处理数据中出现伪像。这些伪像包括配对基因的表达水平被翻转或赋予相同的值,以及值的聚类不在真实表达水平上。我们提出了一种称为基于模型的聚类与数据校正(MCDC)的新方法,该方法能够同时识别和校正这三种伪像。我们表明,MCDC在与外部基线的一致性方面改进了所得的基因表达数据,同时也改善了后续分析的结果。

相似文献

本文引用的文献

2
Drug-induced adverse events prediction with the LINCS L1000 data.利用LINCS L1000数据进行药物诱导不良事件预测
Bioinformatics. 2016 Aug 1;32(15):2338-45. doi: 10.1093/bioinformatics/btw168. Epub 2016 Apr 1.
3
PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS.基于多元混合分析的模式聚类
Multivariate Behav Res. 1970 Apr 1;5(3):329-50. doi: 10.1207/s15327906mbr0503_6.
7
Compound signature detection on LINCS L1000 big data.基于LINCS L1000大数据的复合特征检测
Mol Biosyst. 2015 Mar;11(3):714-22. doi: 10.1039/c4mb00677a. Epub 2015 Jan 22.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验