• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Testing high-dimensional multinomials with applications to text analysis.用于文本分析的高维多项式检验。
J R Stat Soc Series B Stat Methodol. 2024 Feb 28;86(4):922-942. doi: 10.1093/jrsssb/qkae003. eCollection 2024 Sep.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
On pseudolikelihood inference for semiparametric models with boundary problems.关于具有边界问题的半参数模型的拟似然推断
Biometrika. 2017 Mar;104(1):165-179. doi: 10.1093/biomet/asw072. Epub 2017 Feb 18.
4
Minimax Nonparametric Parallelism Test.极小极大非参数并行性检验
J Mach Learn Res. 2020;21.
5
On the asymptotic behaviour of the pseudolikelihood ratio test statistic with boundary problems.关于具有边界问题的伪似然比检验统计量的渐近行为
Biometrika. 2010 Sep;97(3):603-620. doi: 10.1093/biomet/asq031. Epub 2010 Jun 11.
6
Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models.高维逻辑回归模型的全局和同步假设检验
J Am Stat Assoc. 2021;116(534):984-998. doi: 10.1080/01621459.2019.1699421. Epub 2020 Jan 21.
7
Estimation and inference for the causal effect of receiving treatment on a multinomial outcome.接受治疗对多项结果的因果效应的估计与推断。
Biometrics. 2009 Mar;65(1):96-103. doi: 10.1111/j.1541-0420.2008.01020.x. Epub 2008 Mar 29.
8
TEST OF SIGNIFICANCE FOR HIGH-DIMENSIONAL LONGITUDINAL DATA.高维纵向数据的显著性检验
Ann Stat. 2020 Oct;48(5):2622-2645. doi: 10.1214/19-aos1900. Epub 2020 Sep 19.
9
ASYMPTOTIC DISTRIBUTIONS OF HIGH-DIMENSIONAL DISTANCE CORRELATION INFERENCE.高维距离相关性推断的渐近分布
Ann Stat. 2021 Aug;49(4):1999-2020. doi: 10.1214/20-aos2024. Epub 2021 Sep 29.
10
An Extended GFfit Statistic Defined on Orthogonal Components of Pearson's Chi-Square.基于 Pearson 卡方正交分量的扩展 GFfit 统计量。
Psychometrika. 2023 Mar;88(1):208-240. doi: 10.1007/s11336-022-09866-6. Epub 2022 Jun 3.

本文引用的文献

1
The technology and biology of single-cell RNA sequencing.单细胞 RNA 测序技术与生物学。
Mol Cell. 2015 May 21;58(4):610-20. doi: 10.1016/j.molcel.2015.04.005.

用于文本分析的高维多项式检验。

Testing high-dimensional multinomials with applications to text analysis.

作者信息

Cai T Tony, Ke Zheng T, Turner Paxton

机构信息

Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, PA, USA.

Department of Statistics, Harvard University, Cambridge, MA, USA.

出版信息

J R Stat Soc Series B Stat Methodol. 2024 Feb 28;86(4):922-942. doi: 10.1093/jrsssb/qkae003. eCollection 2024 Sep.

DOI:10.1093/jrsssb/qkae003
PMID:39279913
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11398885/
Abstract

Motivated by applications in text mining and discrete distribution inference, we test for equality of probability mass functions of groups of high-dimensional multinomial distributions. Special cases of this problem include global testing for topic models, two-sample testing in authorship attribution, and closeness testing for discrete distributions. A test statistic, which is shown to have an asymptotic standard normal distribution under the null hypothesis, is proposed. This parameter-free limiting null distribution holds true without requiring identical multinomial parameters within each group or equal group sizes. The optimal detection boundary for this testing problem is established, and the proposed test is shown to achieve this optimal detection boundary across the entire parameter space of interest. The proposed method is demonstrated in simulation studies and applied to analyse two real-world datasets to examine, respectively, variation among customer reviews of Amazon movies and the diversity of statistical paper abstracts.

摘要

受文本挖掘和离散分布推断应用的启发,我们对高维多项分布组的概率质量函数的相等性进行检验。该问题的特殊情况包括主题模型的全局检验、作者归属的双样本检验以及离散分布的接近度检验。我们提出了一个检验统计量,在原假设下它被证明具有渐近标准正态分布。这种无参数的极限原分布成立,无需每组内的多项参数相同或组大小相等。建立了此检验问题的最优检测边界,并且所提出的检验在整个感兴趣的参数空间内都能达到这个最优检测边界。所提出的方法在模拟研究中得到了验证,并应用于分析两个真实世界的数据集,分别用于检验亚马逊电影客户评论之间的差异以及统计论文摘要的多样性。