• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

三向计数数据的矩阵变量泊松对数正态分布的有限混合。

Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data.

机构信息

Department of Mathematics and Statistics, University of Guelph, Guelph, ON N1G 2W1, Canada.

Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.

出版信息

Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad167.

DOI:10.1093/bioinformatics/btad167
PMID:37018147
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10159656/
Abstract

MOTIVATION

Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p conditions at r occasions. Matrix variate distributions offer a natural way to model three-way data and mixtures of matrix variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means of discovering gene co-expression networks.

RESULTS

In this work, a mixture of matrix variate Poisson-log normal distributions is proposed for clustering read counts from RNA sequencing. By considering the matrix variate structure, full information on the conditions and occasions of the RNA sequencing dataset is simultaneously considered, and the number of covariance parameters to be estimated is reduced. We propose three different frameworks for parameter estimation: a Markov chain Monte Carlo-based approach, a variational Gaussian approximation-based approach, and a hybrid approach. Various information criteria are used for model selection. The models are applied to both real and simulated data, and we demonstrate that the proposed approaches can recover the underlying cluster structure in both cases. In simulation studies where the true model parameters are known, our proposed approach shows good parameter recovery.

AVAILABILITY AND IMPLEMENTATION

The GitHub R package for this work is available at https://github.com/anjalisilva/mixMVPLN and is released under the open source MIT license.

摘要

动机

三向数据结构,其特征是三个实体,即单位、变量和场合,在生物研究中很常见。在 RNA 测序中,当在 r 个场合下对 n 个基因进行 p 个条件的高通量转录组测序数据收集时,就会得到三向数据结构。矩阵变量分布为三向数据建模提供了一种自然的方法,并且矩阵变量分布的混合可以用于聚类三向数据。基因表达数据的聚类是通过发现基因共表达网络来实现的。

结果

在这项工作中,提出了一种用于聚类 RNA 测序读计数的矩阵变量泊松对数正态分布混合模型。通过考虑矩阵变量结构,同时考虑了 RNA 测序数据集的条件和场合的全部信息,并减少了要估计的协方差参数的数量。我们提出了三种不同的参数估计框架:基于马尔可夫链蒙特卡罗的方法、基于变分高斯逼近的方法和混合方法。使用各种信息准则进行模型选择。该模型应用于真实和模拟数据,我们证明了该方法可以在这两种情况下都能恢复潜在的聚类结构。在真实模型参数已知的模拟研究中,我们提出的方法显示出良好的参数恢复能力。

可用性和实现

这项工作的 GitHub R 包可在 https://github.com/anjalisilva/mixMVPLN 上获得,并根据开放源代码 MIT 许可证发布。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/10159656/80cc30b11450/btad167f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/10159656/cff721791422/btad167f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/10159656/84763cba357b/btad167f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/10159656/bcee66f5c0de/btad167f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/10159656/b276862434ab/btad167f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/10159656/80cc30b11450/btad167f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/10159656/cff721791422/btad167f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/10159656/84763cba357b/btad167f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/10159656/bcee66f5c0de/btad167f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/10159656/b276862434ab/btad167f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a94/10159656/80cc30b11450/btad167f5.jpg

相似文献

1
Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data.三向计数数据的矩阵变量泊松对数正态分布的有限混合。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad167.
2
A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data.一种用于转录组测序数据聚类的多元泊松-对数正态混合模型。
BMC Bioinformatics. 2019 Jul 16;20(1):394. doi: 10.1186/s12859-019-2916-0.
3
A clustering procedure for three-way RNA sequencing data using data transformations and matrix-variate Gaussian mixture models.基于数据变换和矩阵变量高斯混合模型的三方 RNA 测序数据聚类方法。
BMC Bioinformatics. 2024 Mar 1;25(1):90. doi: 10.1186/s12859-024-05717-6.
4
Parametric and nonparametric population methods: their comparative performance in analysing a clinical dataset and two Monte Carlo simulation studies.参数和非参数总体方法:它们在分析临床数据集和两项蒙特卡罗模拟研究中的比较性能。
Clin Pharmacokinet. 2006;45(4):365-83. doi: 10.2165/00003088-200645040-00003.
5
Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models.高通量转录组测序数据的泊松混合模型共表达分析。
Bioinformatics. 2015 May 1;31(9):1420-7. doi: 10.1093/bioinformatics/btu845. Epub 2015 Jan 5.
6
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
7
Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.通过对异构体和外显子特异性读段测序率进行建模来改进RNA测序表达估计。
BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6.
8
Powerful differential expression analysis incorporating network topology for next-generation sequencing data.结合网络拓扑结构用于下一代测序数据的强大差异表达分析。
Bioinformatics. 2017 May 15;33(10):1505-1513. doi: 10.1093/bioinformatics/btw833.
9
Gene network inference by fusing data from diverse distributions.通过融合来自不同分布的数据进行基因网络推断。
Bioinformatics. 2015 Jun 15;31(12):i230-9. doi: 10.1093/bioinformatics/btv258.
10
Block Sparse Variational Bayes Regression Using Matrix Variate Distributions With Application to SSVEP Detection.使用矩阵变量分布的块稀疏变分贝叶斯回归及其在稳态视觉诱发电位检测中的应用
IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):351-365. doi: 10.1109/TNNLS.2020.3027773. Epub 2022 Jan 5.

引用本文的文献

1
Multivariate Poisson lognormal distribution for modeling counts from modern biological data: An overview.用于对现代生物学数据计数进行建模的多元泊松对数正态分布:综述。
Comput Struct Biotechnol J. 2025 Mar 20;27:1255-1264. doi: 10.1016/j.csbj.2025.03.017. eCollection 2025.
2
A clustering procedure for three-way RNA sequencing data using data transformations and matrix-variate Gaussian mixture models.基于数据变换和矩阵变量高斯混合模型的三方 RNA 测序数据聚类方法。
BMC Bioinformatics. 2024 Mar 1;25(1):90. doi: 10.1186/s12859-024-05717-6.

本文引用的文献

1
A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data.一种用于转录组测序数据聚类的多元泊松-对数正态混合模型。
BMC Bioinformatics. 2019 Jul 16;20(1):394. doi: 10.1186/s12859-019-2916-0.
2
Proanthocyanidin accumulation and transcriptional responses in the seed coat of cranberry beans (Phaseolus vulgaris L.) with different susceptibility to postharvest darkening.不同采后褐变敏感性的蔓越莓豆(菜豆)种皮中原花青素的积累及转录反应
BMC Plant Biol. 2017 May 25;17(1):89. doi: 10.1186/s12870-017-1037-z.
3
mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models.
mclust 5:使用高斯有限混合模型进行聚类、分类和密度估计
R J. 2016 Aug;8(1):289-317.
4
NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.NBLDA:用于RNA测序数据的负二项式线性判别分析。
BMC Bioinformatics. 2016 Sep 13;17(1):369. doi: 10.1186/s12859-016-1208-1.
5
Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models.高通量转录组测序数据的泊松混合模型共表达分析。
Bioinformatics. 2015 May 1;31(9):1420-7. doi: 10.1093/bioinformatics/btu845. Epub 2015 Jan 5.
6
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.使用DESeq2对RNA测序数据的倍数变化和离散度进行适度估计。
Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.
7
HTSeq--a Python framework to work with high-throughput sequencing data.HTSeq——一个用于处理高通量测序数据的Python框架。
Bioinformatics. 2015 Jan 15;31(2):166-9. doi: 10.1093/bioinformatics/btu638. Epub 2014 Sep 25.
8
Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation.针对生物变异的多因素 RNA-Seq 实验的差异表达分析。
Nucleic Acids Res. 2012 May;40(10):4288-97. doi: 10.1093/nar/gks042. Epub 2012 Jan 28.
9
A survey of statistical software for analysing RNA-seq data.RNA-seq 数据分析的统计软件调查。
Hum Genomics. 2010 Oct;5(1):56-60. doi: 10.1186/1479-7364-5-1-56.
10
A scaling normalization method for differential expression analysis of RNA-seq data.RNA-seq 数据差异表达分析的缩放标准化方法。
Genome Biol. 2010;11(3):R25. doi: 10.1186/gb-2010-11-3-r25. Epub 2010 Mar 2.