• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从大数据集中进行选择抽样以用于混合模型中的靶向推断

Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling.

作者信息

Manolopoulou Ioanna, Chan Cliburn, West Mike

机构信息

Department of Statistical Science, Duke University, Durham, NC,

出版信息

Bayesian Anal. 2010;5(3):1-22.

PMID:20865145
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2943396/
Abstract

One of the challenges in using Markov chain Monte Carlo for model analysis in studies with very large datasets is the need to scan through the whole data at each iteration of the sampler, which can be computationally prohibitive. Several approaches have been developed to address this, typically drawing computationally manageable subsamples of the data. Here we consider the specific case where most of the data from a mixture model provides little or no information about the parameters of interest, and we aim to select subsamples such that the information extracted is most relevant. The motivating application arises in flow cytometry, where several measurements from a vast number of cells are available. Interest lies in identifying specific rare cell subtypes and characterizing them according to their corresponding markers. We present a Markov chain Monte Carlo approach where an initial subsample of the full dataset is used to guide selection sampling of a further set of observations targeted at a scientifically interesting, low probability region. We define a Sequential Monte Carlo strategy in which the targeted subsample is augmented sequentially as estimates improve, and introduce a stopping rule for determining the size of the targeted subsample. An example from flow cytometry illustrates the ability of the approach to increase the resolution of inferences for rare cell subtypes.

摘要

在使用马尔可夫链蒙特卡罗方法对超大型数据集进行模型分析时,其中一个挑战是在采样器的每次迭代中都需要扫描整个数据,这在计算上可能是难以承受的。已经开发了几种方法来解决这个问题,通常是抽取计算上易于处理的数据子样本。在这里,我们考虑一种特殊情况,即混合模型中的大部分数据几乎没有或根本没有提供关于感兴趣参数的信息,我们的目标是选择子样本,以便提取的信息最相关。激发我们开展这项研究的应用场景来自流式细胞术,在该技术中,可以获得大量细胞的多项测量数据。我们感兴趣的是识别特定的罕见细胞亚型,并根据其相应的标志物对它们进行表征。我们提出了一种马尔可夫链蒙特卡罗方法,其中完整数据集的初始子样本用于指导针对科学上有趣的低概率区域的另一组观测值的选择采样。我们定义了一种序贯蒙特卡罗策略,在该策略中,随着估计的改进,目标子样本会依次增加,并引入了一个停止规则来确定目标子样本的大小。流式细胞术的一个例子说明了该方法提高对罕见细胞亚型推断分辨率的能力。

相似文献

1
Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling.从大数据集中进行选择抽样以用于混合模型中的靶向推断
Bayesian Anal. 2010;5(3):1-22.
2
Planning Implications Related to Sterilization-Sensitive Science Investigations Associated with Mars Sample Return (MSR).与火星样本返回(MSR)相关的对灭菌敏感的科学研究的规划意义。
Astrobiology. 2022 Jun;22(S1):S112-S164. doi: 10.1089/AST.2021.0113. Epub 2022 May 19.
3
A Sequential Monte Carlo Method for Bayesian Analysis of Massive Datasets.一种用于海量数据集贝叶斯分析的序贯蒙特卡罗方法。
Data Min Knowl Discov. 2003 Jul 1;7(3):301-319. doi: 10.1023/A:1024084221803.
4
Performance of Hamiltonian Monte Carlo and No-U-Turn Sampler for estimating genetic parameters and breeding values.汉密尔顿蒙特卡罗法和无回转抽样器在估计遗传参数和育种值中的性能。
Genet Sel Evol. 2019 Dec 10;51(1):73. doi: 10.1186/s12711-019-0515-1.
5
Robust Inference of Dynamic Covariance Using Wishart Processes and Sequential Monte Carlo.使用威沙特过程和序贯蒙特卡罗方法对动态协方差进行稳健推断。
Entropy (Basel). 2024 Aug 16;26(8):695. doi: 10.3390/e26080695.
6
Scalable Bayesian Nonparametric Clustering and Classification.可扩展的贝叶斯非参数聚类与分类
J Comput Graph Stat. 2020;29(1):53-65. doi: 10.1080/10618600.2019.1624366. Epub 2019 Jul 19.
7
BAYESIAN INFERENCE OF STOCHASTIC REACTION NETWORKS USING MULTIFIDELITY SEQUENTIAL TEMPERED MARKOV CHAIN MONTE CARLO.使用多保真度序贯回火马尔可夫链蒙特卡罗方法对随机反应网络进行贝叶斯推断。
Int J Uncertain Quantif. 2020;10(6):515-542. doi: 10.1615/int.j.uncertaintyquantification.2020033241.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
A comparison of computational algorithms for the Bayesian analysis of clinical trials.临床试验贝叶斯分析的计算算法比较。
Clin Trials. 2024 Dec;21(6):689-700. doi: 10.1177/17407745241247334. Epub 2024 May 16.
10
Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics.平行马尔可夫链蒙特卡罗 - 弥合动物育种和遗传学中高性能贝叶斯计算的差距。
Genet Sel Evol. 2012 Sep 25;44(1):29. doi: 10.1186/1297-9686-44-29.

引用本文的文献

1
Clustering spatio-temporal series of confirmed COVID-19 deaths in Europe.欧洲新冠肺炎确诊死亡病例的时空序列聚类分析
Spat Stat. 2022 Jun;49:100543. doi: 10.1016/j.spasta.2021.100543. Epub 2021 Oct 6.
2
Parameterizing Spatial Models of Infectious Disease Transmission that Incorporate Infection Time Uncertainty Using Sampling-Based Likelihood Approximations.使用基于采样的似然近似法对包含感染时间不确定性的传染病传播空间模型进行参数化。
PLoS One. 2016 Jan 5;11(1):e0146253. doi: 10.1371/journal.pone.0146253. eCollection 2016.
3
SWIFT-scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets, part 2: biological evaluation.

本文引用的文献

1
Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures.理解用于统计计算的GPU编程:大规模并行大规模混合研究
J Comput Graph Stat. 2010 Jun 1;19(2):419-438. doi: 10.1198/jcgs.2010.10016.
2
A Sequential Monte Carlo Method for Bayesian Analysis of Massive Datasets.一种用于海量数据集贝叶斯分析的序贯蒙特卡罗方法。
Data Min Knowl Discov. 2003 Jul 1;7(3):301-319. doi: 10.1023/A:1024084221803.
3
Automated high-dimensional flow cytometric data analysis.自动化高维流式细胞术数据分析。
用于在大型高维流式细胞术数据集中自动识别稀有细胞群体的SWIFT可扩展聚类,第2部分:生物学评估。
Cytometry A. 2014 May;85(5):422-33. doi: 10.1002/cyto.a.22445. Epub 2014 Feb 14.
4
Hierarchical Bayesian mixture modelling for antigen-specific T-cell subtyping in combinatorially encoded flow cytometry studies.用于组合编码流式细胞术研究中抗原特异性T细胞亚型分型的分层贝叶斯混合建模。
Stat Appl Genet Mol Biol. 2013 Jun;12(3):309-31. doi: 10.1515/sagmb-2012-0001.
5
Efficient Classification-Based Relabeling in Mixture Models.混合模型中基于高效分类的重新标记
Am Stat. 2011 Feb 1;65(1):16-20. doi: 10.1198/tast.2011.10170.
Proc Natl Acad Sci U S A. 2009 May 26;106(21):8519-24. doi: 10.1073/pnas.0903028106. Epub 2009 May 14.
4
Statistical mixture modeling for cell subtype identification in flow cytometry.用于流式细胞术中细胞亚型识别的统计混合模型
Cytometry A. 2008 Aug;73(8):693-701. doi: 10.1002/cyto.a.20583.
5
T-cell quality in memory and protection: implications for vaccine design.记忆与保护中的T细胞质量:对疫苗设计的启示
Nat Rev Immunol. 2008 Apr;8(4):247-58. doi: 10.1038/nri2274. Epub 2008 Mar 7.