Suppr超能文献

非高斯模型最优贝叶斯分类器的MCMC实现:基于模型的RNA测序分类

MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification.

作者信息

Knight Jason M, Ivanov Ivan, Dougherty Edward R

机构信息

Department of Electrical Engineering in Texas A&M University, 3128 TAMU, College Station, 77843, TX, USA.

Department of Veterinary Physiology and Pharmacology in Texas A&M University, 3128 TAMU, College Station, 77843, TX, USA.

出版信息

BMC Bioinformatics. 2014 Dec 10;15(1):401. doi: 10.1186/s12859-014-0401-3.

Abstract

BACKGROUND

Sequencing datasets consist of a finite number of reads which map to specific regions of a reference genome. Most effort in modeling these datasets focuses on the detection of univariate differentially expressed genes. However, for classification, we must consider multiple genes and their interactions.

RESULTS

Thus, we introduce a hierarchical multivariate Poisson model (MP) and the associated optimal Bayesian classifier (OBC) for classifying samples using sequencing data. Lacking closed-form solutions, we employ a Monte Carlo Markov Chain (MCMC) approach to perform classification. We demonstrate superior or equivalent classification performance compared to typical classifiers for two synthetic datasets and over a range of classification problem difficulties. We also introduce the Bayesian minimum mean squared error (MMSE) conditional error estimator and demonstrate its computation over the feature space. In addition, we demonstrate superior or leading class performance over an RNA-Seq dataset containing two lung cancer tumor types from The Cancer Genome Atlas (TCGA).

CONCLUSIONS

Through model-based, optimal Bayesian classification, we demonstrate superior classification performance for both synthetic and real RNA-Seq datasets. A tutorial video and Python source code is available under an open source license at http://bit.ly/1gimnss .

摘要

背景

测序数据集由映射到参考基因组特定区域的有限数量的读段组成。对这些数据集进行建模的大部分工作都集中在单变量差异表达基因的检测上。然而,对于分类而言,我们必须考虑多个基因及其相互作用。

结果

因此,我们引入了一种分层多变量泊松模型(MP)和相关的最优贝叶斯分类器(OBC),用于使用测序数据对样本进行分类。由于缺乏闭式解,我们采用蒙特卡罗马尔可夫链(MCMC)方法来进行分类。对于两个合成数据集以及一系列分类问题难度,我们证明了与典型分类器相比具有优越或等效的分类性能。我们还引入了贝叶斯最小均方误差(MMSE)条件误差估计器,并展示了其在特征空间上的计算。此外,在来自癌症基因组图谱(TCGA)的包含两种肺癌肿瘤类型的RNA测序数据集上,我们证明了具有优越或领先的分类性能。

结论

通过基于模型的最优贝叶斯分类,我们证明了在合成和真实RNA测序数据集上均具有优越的分类性能。一个教程视频和Python源代码可在http://bit.ly/1gimnss以开源许可获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60a9/4265360/33f53907cb6c/12859_2014_401_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验