• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

OASIS:一种可解释的、有限样本有效的替代 Pearson 的方法,用于科学发现。

OASIS: An interpretable, finite-sample valid alternative to Pearson's for scientific discovery.

机构信息

Eric and Wendy Schmidt Center, Broad Institute, Cambridge, MA 02142.

Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02115.

出版信息

Proc Natl Acad Sci U S A. 2024 Apr 9;121(15):e2304671121. doi: 10.1073/pnas.2304671121. Epub 2024 Apr 2.

DOI:10.1073/pnas.2304671121
PMID:38564640
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11009617/
Abstract

Contingency tables, data represented as counts matrices, are ubiquitous across quantitative research and data-science applications. Existing statistical tests are insufficient however, as none are simultaneously computationally efficient and statistically valid for a finite number of observations. In this work, motivated by a recent application in reference-free genomic inference [K. Chaung ., , 5440-5456 (2023)], we develop Optimized Adaptive Statistic for Inferring Structure (OASIS), a family of statistical tests for contingency tables. OASIS constructs a test statistic which is linear in the normalized data matrix, providing closed-form -value bounds through classical concentration inequalities. In the process, OASIS provides a decomposition of the table, lending interpretability to its rejection of the null. We derive the asymptotic distribution of the OASIS test statistic, showing that these finite-sample bounds correctly characterize the test statistic's -value up to a variance term. Experiments on genomic sequencing data highlight the power and interpretability of OASIS. Using OASIS, we develop a method that can detect SARS-CoV-2 and strains de novo, which existing approaches cannot achieve. We demonstrate in simulations that OASIS is robust to overdispersion, a common feature in genomic data like single-cell RNA sequencing, where under accepted noise models OASIS provides good control of the false discovery rate, while Pearson's [Formula: see text] consistently rejects the null. Additionally, we show in simulations that OASIS is more powerful than Pearson's [Formula: see text] in certain regimes, including for some important two group alternatives, which we corroborate with approximate power calculations.

摘要

列联表,以计数矩阵表示的数据,在定量研究和数据科学应用中无处不在。然而,现有的统计检验方法并不充分,因为没有一种方法在有限的观测次数下同时具有计算效率和统计有效性。在这项工作中,受最近在无参考基因组推断中的应用的启发[K. Chaung., , 5440-5456 (2023)],我们开发了一种用于推断结构的优化自适应统计量(OASIS),这是一种用于列联表的统计检验方法。OASIS 构建了一个测试统计量,它与归一化数据矩阵线性相关,通过经典的集中不等式提供了闭式[Formula: see text]值界。在这个过程中,OASIS 对表格进行了分解,使其对零假设的拒绝具有可解释性。我们推导出了 OASIS 测试统计量的渐近分布,表明这些有限样本界在方差项的限制下正确地描述了测试统计量的[Formula: see text]值。对基因组测序数据的实验突出了 OASIS 的强大功能和可解释性。使用 OASIS,我们开发了一种可以从头检测 SARS-CoV-2 和 株的方法,而现有方法无法实现。我们在模拟中证明,OASIS 对过度分散具有鲁棒性,过度分散是单细胞 RNA 测序等基因组数据中的常见特征,在接受的噪声模型下,OASIS 提供了对错误发现率的良好控制,而 Pearson 的[Formula: see text]则一致拒绝零假设。此外,我们在模拟中表明,在某些情况下,OASIS 比 Pearson 的[Formula: see text]更有效,包括某些重要的两组替代情况,我们通过近似功效计算进行了验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e1b/11009617/a75a115cce35/pnas.2304671121fig05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e1b/11009617/f12422b178fd/pnas.2304671121fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e1b/11009617/fc58368bd695/pnas.2304671121fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e1b/11009617/49eaef9deba0/pnas.2304671121fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e1b/11009617/3e4c2b0524a9/pnas.2304671121fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e1b/11009617/a75a115cce35/pnas.2304671121fig05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e1b/11009617/f12422b178fd/pnas.2304671121fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e1b/11009617/fc58368bd695/pnas.2304671121fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e1b/11009617/49eaef9deba0/pnas.2304671121fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e1b/11009617/3e4c2b0524a9/pnas.2304671121fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e1b/11009617/a75a115cce35/pnas.2304671121fig05.jpg

相似文献

1
OASIS: An interpretable, finite-sample valid alternative to Pearson's for scientific discovery.OASIS:一种可解释的、有限样本有效的替代 Pearson 的方法,用于科学发现。
Proc Natl Acad Sci U S A. 2024 Apr 9;121(15):e2304671121. doi: 10.1073/pnas.2304671121. Epub 2024 Apr 2.
2
OASIS: An interpretable, finite-sample valid alternative to Pearson's for scientific discovery.绿洲(OASIS):一种可解释的、有限样本有效的皮尔逊检验替代方法,用于科学发现。
bioRxiv. 2023 Nov 3:2023.03.16.533008. doi: 10.1101/2023.03.16.533008.
3
An Extended GFfit Statistic Defined on Orthogonal Components of Pearson's Chi-Square.基于 Pearson 卡方正交分量的扩展 GFfit 统计量。
Psychometrika. 2023 Mar;88(1):208-240. doi: 10.1007/s11336-022-09866-6. Epub 2022 Jun 3.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
Planning Implications Related to Sterilization-Sensitive Science Investigations Associated with Mars Sample Return (MSR).与火星样本返回(MSR)相关的对灭菌敏感的科学研究的规划意义。
Astrobiology. 2022 Jun;22(S1):S112-S164. doi: 10.1089/AST.2021.0113. Epub 2022 May 19.
6
Accurate and efficient power calculations for 2 x m tables in unmatched case-control designs.非匹配病例对照设计中2×m表格的准确高效功效计算。
Stat Med. 2006 Aug 15;25(15):2632-46. doi: 10.1002/sim.2269.
7
Correlation-based inference for linkage disequilibrium with multiple alleles.基于相关性的多等位基因连锁不平衡推断
Genetics. 2008 Sep;180(1):533-45. doi: 10.1534/genetics.108.089409. Epub 2008 Aug 30.
8
A robust genome-wide scan statistic of the Wellcome Trust Case-Control Consortium.威康信托病例对照研究联盟的一项强大的全基因组扫描统计数据。
Biometrics. 2009 Dec;65(4):1115-22. doi: 10.1111/j.1541-0420.2009.01185.x.
9
Symmetry in square contingency tables: tests of hypotheses and confidence interval construction.方形列联表中的对称性:假设检验与置信区间构建
J Biopharm Stat. 2001 Feb-May;11(1-2):23-33. doi: 10.1081/BIP-100104195.
10
A logical analysis of null hypothesis significance testing using popular terminology.使用通俗术语对零假设显著性检验进行逻辑分析。
BMC Med Res Methodol. 2022 Sep 19;22(1):244. doi: 10.1186/s12874-022-01696-5.

引用本文的文献

1
sc-SPLASH provides ultra-efficient reference-free discovery in barcoded single-cell sequencing.sc-SPLASH在条形码单细胞测序中提供超高效的无参考发现。
bioRxiv. 2024 Dec 24:2024.12.24.630263. doi: 10.1101/2024.12.24.630263.
2
Scalable and unsupervised discovery from raw sequencing reads using SPLASH2.使用SPLASH2从原始测序读数中进行可扩展且无监督的发现。
Nat Biotechnol. 2024 Sep 23. doi: 10.1038/s41587-024-02381-2.
3
SPLASH: A statistical, reference-free genomic algorithm unifies biological discovery.SPLASH:一种基于统计、无参考基因组的算法,统一了生物发现。

本文引用的文献

1
SPLASH: A statistical, reference-free genomic algorithm unifies biological discovery.SPLASH:一种基于统计、无参考基因组的算法,统一了生物发现。
Cell. 2023 Dec 7;186(25):5440-5456.e26. doi: 10.1016/j.cell.2023.10.028.
2
Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data.基于单细胞 RNA-seq 数据降维、批次整合和可视化的对应分析。
Sci Rep. 2023 Jan 21;13(1):1197. doi: 10.1038/s41598-022-26434-1.
3
Detection and prevalence of SARS-CoV-2 co-infections during the Omicron variant circulation in France.
Cell. 2023 Dec 7;186(25):5440-5456.e26. doi: 10.1016/j.cell.2023.10.028.
4
SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads.SPLASH2可对原始测序读数进行超高效、可扩展且无监督的发现。
bioRxiv. 2024 Mar 30:2023.03.17.533189. doi: 10.1101/2023.03.17.533189.
5
SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery.SPLASH:一种无参考基因组统计算法统一了生物学发现。
bioRxiv. 2023 Jul 31:2022.06.24.497555. doi: 10.1101/2022.06.24.497555.
法国奥密克戎变异株流行期间 SARS-CoV-2 合并感染的检测和流行情况。
Nat Commun. 2022 Oct 23;13(1):6316. doi: 10.1038/s41467-022-33910-9.
4
High fluoroquinolone resistance proportions among multidrug-resistant tuberculosis driven by dominant L2 Mycobacterium tuberculosis clones in the Mumbai Metropolitan Region.高氟喹诺酮耐药比例在孟买大都市区由主导 L2 结核分枝杆菌克隆驱动的耐多药结核病中。
Genome Med. 2022 Aug 22;14(1):95. doi: 10.1186/s13073-022-01076-0.
5
Statistics or biology: the zero-inflation controversy about scRNA-seq data.统计学还是生物学:关于 scRNA-seq 数据的零膨胀争议。
Genome Biol. 2022 Jan 21;23(1):31. doi: 10.1186/s13059-022-02601-5.
6
Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model.基于多项模型的单细胞 RNA-Seq 特征选择和降维。
Genome Biol. 2019 Dec 23;20(1):295. doi: 10.1186/s13059-019-1861-6.
7
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.使用DESeq2对RNA测序数据的倍数变化和离散度进行适度估计。
Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.
8
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.短DNA序列与人类基因组的超快速且内存高效比对。
Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. Epub 2009 Mar 4.
9
The derivation and partition of chi2 in certain discrete distributions.卡方(χ²)在某些离散分布中的推导与划分。
Biometrika. 1949 Jun;36(Pt. 1-2):117-29.