• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

非高斯模型最优贝叶斯分类器的MCMC实现:基于模型的RNA测序分类

MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification.

作者信息

Knight Jason M, Ivanov Ivan, Dougherty Edward R

机构信息

Department of Electrical Engineering in Texas A&M University, 3128 TAMU, College Station, 77843, TX, USA.

Department of Veterinary Physiology and Pharmacology in Texas A&M University, 3128 TAMU, College Station, 77843, TX, USA.

出版信息

BMC Bioinformatics. 2014 Dec 10;15(1):401. doi: 10.1186/s12859-014-0401-3.

DOI:10.1186/s12859-014-0401-3
PMID:25491122
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4265360/
Abstract

BACKGROUND

Sequencing datasets consist of a finite number of reads which map to specific regions of a reference genome. Most effort in modeling these datasets focuses on the detection of univariate differentially expressed genes. However, for classification, we must consider multiple genes and their interactions.

RESULTS

Thus, we introduce a hierarchical multivariate Poisson model (MP) and the associated optimal Bayesian classifier (OBC) for classifying samples using sequencing data. Lacking closed-form solutions, we employ a Monte Carlo Markov Chain (MCMC) approach to perform classification. We demonstrate superior or equivalent classification performance compared to typical classifiers for two synthetic datasets and over a range of classification problem difficulties. We also introduce the Bayesian minimum mean squared error (MMSE) conditional error estimator and demonstrate its computation over the feature space. In addition, we demonstrate superior or leading class performance over an RNA-Seq dataset containing two lung cancer tumor types from The Cancer Genome Atlas (TCGA).

CONCLUSIONS

Through model-based, optimal Bayesian classification, we demonstrate superior classification performance for both synthetic and real RNA-Seq datasets. A tutorial video and Python source code is available under an open source license at http://bit.ly/1gimnss .

摘要

背景

测序数据集由映射到参考基因组特定区域的有限数量的读段组成。对这些数据集进行建模的大部分工作都集中在单变量差异表达基因的检测上。然而,对于分类而言,我们必须考虑多个基因及其相互作用。

结果

因此,我们引入了一种分层多变量泊松模型(MP)和相关的最优贝叶斯分类器(OBC),用于使用测序数据对样本进行分类。由于缺乏闭式解,我们采用蒙特卡罗马尔可夫链(MCMC)方法来进行分类。对于两个合成数据集以及一系列分类问题难度,我们证明了与典型分类器相比具有优越或等效的分类性能。我们还引入了贝叶斯最小均方误差(MMSE)条件误差估计器,并展示了其在特征空间上的计算。此外,在来自癌症基因组图谱(TCGA)的包含两种肺癌肿瘤类型的RNA测序数据集上,我们证明了具有优越或领先的分类性能。

结论

通过基于模型的最优贝叶斯分类,我们证明了在合成和真实RNA测序数据集上均具有优越的分类性能。一个教程视频和Python源代码可在http://bit.ly/1gimnss以开源许可获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60a9/4265360/8a49fcc89401/12859_2014_401_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60a9/4265360/33f53907cb6c/12859_2014_401_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60a9/4265360/68a38d7c967f/12859_2014_401_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60a9/4265360/1c052e02388a/12859_2014_401_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60a9/4265360/8a49fcc89401/12859_2014_401_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60a9/4265360/33f53907cb6c/12859_2014_401_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60a9/4265360/68a38d7c967f/12859_2014_401_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60a9/4265360/1c052e02388a/12859_2014_401_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60a9/4265360/8a49fcc89401/12859_2014_401_Fig4_HTML.jpg

相似文献

1
MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification.非高斯模型最优贝叶斯分类器的MCMC实现:基于模型的RNA测序分类
BMC Bioinformatics. 2014 Dec 10;15(1):401. doi: 10.1186/s12859-014-0401-3.
2
Detecting Multivariate Gene Interactions in RNA-Seq Data Using Optimal Bayesian Classification.基于最优贝叶斯分类的 RNA-Seq 数据中多变量基因交互作用检测。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar-Apr;15(2):484-493. doi: 10.1109/TCBB.2015.2485223. Epub 2015 Oct 1.
3
Application of the Bayesian MMSE estimator for classification error to gene expression microarray data.贝叶斯 MMSE 估计器在基因表达微阵列数据分类误差中的应用。
Bioinformatics. 2011 Jul 1;27(13):1822-31. doi: 10.1093/bioinformatics/btr272. Epub 2011 May 5.
4
Estimation of evolutionary parameters using short, random and partial sequences from mixed samples of anonymous individuals.利用来自匿名个体混合样本的短的、随机的和部分序列估计进化参数。
BMC Bioinformatics. 2015 Nov 4;16:357. doi: 10.1186/s12859-015-0810-y.
5
Optimal Bayesian Transfer Learning for Count Data.最优贝叶斯迁移学习在计数数据中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):644-655. doi: 10.1109/TCBB.2019.2920981. Epub 2021 Apr 8.
6
Implementation of a practical Markov chain Monte Carlo sampling algorithm in PyBioNetFit.在 PyBioNetFit 中实现实用的马尔可夫链蒙特卡罗采样算法。
Bioinformatics. 2022 Mar 4;38(6):1770-1772. doi: 10.1093/bioinformatics/btac004.
7
Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST.贝叶斯系统发育学中的自适应马尔可夫链蒙特卡罗方法:在BEAST中分析分区数据的应用
Bioinformatics. 2017 Jun 15;33(12):1798-1805. doi: 10.1093/bioinformatics/btx088.
8
AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics.AWTY(我们到了吗?):一种用于贝叶斯系统发育学中马尔可夫链蒙特卡罗收敛性图形化探索的系统。
Bioinformatics. 2008 Feb 15;24(4):581-3. doi: 10.1093/bioinformatics/btm388. Epub 2007 Aug 30.
9
Guided tree topology proposals for Bayesian phylogenetic inference.贝叶斯系统发育推断的引导树拓扑提议。
Syst Biol. 2012 Jan;61(1):1-11. doi: 10.1093/sysbio/syr074. Epub 2011 Aug 9.
10
NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data.NPEBseq:一种基于非参数经验贝叶斯的 RNA-seq 数据差异表达分析方法。
BMC Bioinformatics. 2013 Aug 27;14:262. doi: 10.1186/1471-2105-14-262.

引用本文的文献

1
On exact Bayesian credible sets for discrete parameters.关于离散参数的精确贝叶斯可信集。
Stat Probab Lett. 2025 Mar;218. doi: 10.1016/j.spl.2024.110295. Epub 2024 Nov 22.
2
Proteomic study of facial melasma.面部黄褐斑的蛋白质组学研究
An Bras Dermatol. 2022 Nov-Dec;97(6):808-814. doi: 10.1016/j.abd.2021.06.010. Epub 2022 Sep 10.
3
Machine Learning in Epigenomics: Insights into Cancer Biology and Medicine.机器学习在表观基因组学中的应用:癌症生物学和医学的新视角。

本文引用的文献

1
Incorporation of Biological Pathway Knowledge in the Construction of Priors for Optimal Bayesian Classification.在构建最优贝叶斯分类的先验概率时纳入生物通路知识。
IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):202-18. doi: 10.1109/TCBB.2013.143.
2
Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models.高通量转录组测序数据的泊松混合模型共表达分析。
Bioinformatics. 2015 May 1;31(9):1420-7. doi: 10.1093/bioinformatics/btu845. Epub 2015 Jan 5.
3
Effect of separate sampling on classification accuracy.
Biochim Biophys Acta Rev Cancer. 2021 Dec;1876(2):188588. doi: 10.1016/j.bbcan.2021.188588. Epub 2021 Jul 7.
4
Gut-host Crosstalk: Methodological and Computational Challenges.肠道-宿主串扰:方法学和计算挑战。
Dig Dis Sci. 2020 Mar;65(3):686-694. doi: 10.1007/s10620-020-06105-9.
5
A Nonmathematical Review of Optimal Operator and Experimental Design for Uncertain Scientific Models with Application to Genomics.不确定科学模型的最优算子与实验设计的非数学综述及其在基因组学中的应用
Curr Genomics. 2019 Jan;20(1):16-23. doi: 10.2174/1389202919666181213095743.
6
A Multi-Trait Approach Identified Genetic Variants Including a Rare Mutation in RGS3 with Impact on Abnormalities of Cardiac Structure/Function.一种多特征分析方法鉴定了 RGS3 中的遗传变异,包括一种罕见的突变,该突变与心脏结构/功能异常有关。
Sci Rep. 2019 Apr 10;9(1):5845. doi: 10.1038/s41598-019-41362-3.
7
Bayesian Regression with Network Prior: Optimal Bayesian Filtering Perspective.具有网络先验的贝叶斯回归:最优贝叶斯滤波视角
IEEE Trans Signal Process. 2016 Dec 1;64(23):6243-6253. doi: 10.1109/TSP.2016.2605072. Epub 2016 Sep 1.
8
The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data.基于模型的使用RNA测序数据报告小特征集列表有效性的研究
Cancer Inform. 2017 Jun 12;16:1176935117710530. doi: 10.1177/1176935117710530. eCollection 2017.
9
Bayesian ABC-MCMC Classification of Liquid Chromatography-Mass Spectrometry Data.液相色谱-质谱数据的贝叶斯近似贝叶斯计算马尔可夫链蒙特卡罗分类法
Cancer Inform. 2017 Jan 9;14(Suppl 5):175-182. doi: 10.4137/CIN.S30798. eCollection 2015.
10
Incorporating prior knowledge induced from stochastic differential equations in the classification of stochastic observations.将由随机微分方程推导得出的先验知识纳入随机观测的分类中。
EURASIP J Bioinform Syst Biol. 2016 Jan 20;2016(1):2. doi: 10.1186/s13637-016-0036-y. eCollection 2016 Dec.
单独采样对分类精度的影响。
Bioinformatics. 2014 Jan 15;30(2):242-50. doi: 10.1093/bioinformatics/btt662. Epub 2013 Nov 20.
4
Model-based clustering for RNA-seq data.基于模型的 RNA-seq 数据聚类。
Bioinformatics. 2014 Jan 15;30(2):197-205. doi: 10.1093/bioinformatics/btt632. Epub 2013 Nov 4.
5
A hierarchical poisson log-normal model for network inference from RNA sequencing data.基于 RNA 测序数据的网络推断的层次泊松对数正态模型。
PLoS One. 2013 Oct 17;8(10):e77503. doi: 10.1371/journal.pone.0077503. eCollection 2013.
6
Modeling the next generation sequencing sample processing pipeline for the purposes of classification.为分类目的对下一代测序样本处理管道进行建模。
BMC Bioinformatics. 2013 Oct 11;14:307. doi: 10.1186/1471-2105-14-307.
7
A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis.Illumina 高通量 RNA 测序数据分析中标准化方法的综合评估。
Brief Bioinform. 2013 Nov;14(6):671-83. doi: 10.1093/bib/bbs046. Epub 2012 Sep 17.
8
The illusion of distribution-free small-sample classification in genomics.基因组学中小样本分类的无分布假象。
Curr Genomics. 2011 Aug;12(5):333-41. doi: 10.2174/138920211796429763.
9
Application of the Bayesian MMSE estimator for classification error to gene expression microarray data.贝叶斯 MMSE 估计器在基因表达微阵列数据分类误差中的应用。
Bioinformatics. 2011 Jul 1;27(13):1822-31. doi: 10.1093/bioinformatics/btr272. Epub 2011 May 5.
10
An invariant form for the prior probability in estimation problems.估计问题中先验概率的一种不变形式。
Proc R Soc Lond A Math Phys Sci. 1946;186(1007):453-61. doi: 10.1098/rspa.1946.0056.