• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

超高维遗传数据的快速概率白化变换

Fast Probabilistic Whitening Transformation for Ultra-High Dimensional Genetic Data.

作者信息

Hoffman Gabriel E, Roussos Panos

机构信息

Center for Disease Neurogenomics, Department of Psychiatry, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Center for Precision Medicine and Translational Therapeutics, Mental Illness Research, Education and Clinical Center VISN2, James J. Peters VA Medical Center, Bronx, NY, USA.

出版信息

bioRxiv. 2025 Sep 4:2025.09.01.673591. doi: 10.1101/2025.09.01.673591.

DOI:10.1101/2025.09.01.673591
PMID:40949945
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12424976/
Abstract

Statistical methods often make assumptions about independence between the samples or features of a dataset. Yet correlation structure is ubiquitous in real data, so these assumptions are often not met in practice. Whitening transformations are widely applied to remove this correlation structure. Existing approaches to whitening are based on standard linear algebra, rather than a probabilistic model, and application to high dimensional datasets with samples and features is problematic as approaches or exceeds . Moreover, the computational time becomes prohibitive since the naive transform is cubic in . Here we propose a probabilistic model for data whitening and examine its properties based on first principles as increases. We demonstrate the statistical properties of the probabilistic model and derive a remarkably efficient algorithm that is linear instead of cubic time in the number of features. We examine the out-of-sample performance of the probabilistic whitening model on simulated data, as well as real gene expression and genotype data. In an application to impute z-statistics from unobserved genetic variants from a genome-wide association study of schizophrenia, the probabilistic whitening transformation, implemented in our open source R package decorrelate, had the lowest mean square error while being up to an order of magnitude faster than other methods.

摘要

统计方法常常对数据集中样本或特征之间的独立性做出假设。然而,相关结构在实际数据中普遍存在,所以这些假设在实践中往往无法满足。白化变换被广泛应用于去除这种相关结构。现有的白化方法基于标准线性代数,而非概率模型,并且将其应用于具有(n)个样本和(p)个特征的高维数据集时存在问题,因为方法或超过了(n)。此外,由于朴素变换在(p)上是三次方的,计算时间变得令人望而却步。在此,我们提出一种用于数据白化的概率模型,并基于第一原理研究其随着(p)增加的性质。我们展示了概率模型的统计性质,并推导了一种显著高效的算法,该算法在特征数量上是线性时间而非三次方时间。我们研究了概率白化模型在模拟数据以及真实基因表达和基因型数据上的样本外性能。在一项从精神分裂症全基因组关联研究中未观察到的遗传变异推断(z)统计量的应用中,我们开源的R包decorrelate中实现的概率白化变换具有最低的均方误差,同时比其他方法快一个数量级。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d7f/12424976/11095992761d/nihpp-2025.09.01.673591v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d7f/12424976/5295d2c8d3dc/nihpp-2025.09.01.673591v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d7f/12424976/bb5f1a533d33/nihpp-2025.09.01.673591v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d7f/12424976/5192e6459ef9/nihpp-2025.09.01.673591v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d7f/12424976/5b891ffc0e5f/nihpp-2025.09.01.673591v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d7f/12424976/f67d48d1d20e/nihpp-2025.09.01.673591v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d7f/12424976/11095992761d/nihpp-2025.09.01.673591v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d7f/12424976/5295d2c8d3dc/nihpp-2025.09.01.673591v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d7f/12424976/bb5f1a533d33/nihpp-2025.09.01.673591v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d7f/12424976/5192e6459ef9/nihpp-2025.09.01.673591v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d7f/12424976/5b891ffc0e5f/nihpp-2025.09.01.673591v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d7f/12424976/f67d48d1d20e/nihpp-2025.09.01.673591v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d7f/12424976/11095992761d/nihpp-2025.09.01.673591v1-f0006.jpg

相似文献

1
Fast Probabilistic Whitening Transformation for Ultra-High Dimensional Genetic Data.超高维遗传数据的快速概率白化变换
bioRxiv. 2025 Sep 4:2025.09.01.673591. doi: 10.1101/2025.09.01.673591.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Sexual Harassment and Prevention Training性骚扰与预防培训
4
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
5
Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理:一项网络荟萃分析。
Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.
6
Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.静脉注射硫酸镁和索他洛尔预防冠状动脉搭桥术后房颤:系统评价与经济学评估
Health Technol Assess. 2008 Jun;12(28):iii-iv, ix-95. doi: 10.3310/hta12280.
7
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
8
Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义
APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.
9
Post-pandemic planning for maternity care for local, regional, and national maternity systems across the four nations: a mixed-methods study.针对四个地区的地方、区域和国家孕产妇保健系统的疫情后规划:一项混合方法研究。
Health Soc Care Deliv Res. 2025 Sep;13(35):1-25. doi: 10.3310/HHTE6611.
10
The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.成年自闭症患者的就业生活经历:系统检索与综述
Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.

本文引用的文献

1
Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions.高维中基于交叉验证损失的协方差矩阵估计器选择
J Comput Graph Stat. 2023;32(2):601-612. doi: 10.1080/10618600.2022.2110883. Epub 2022 Oct 7.
2
Mapping genomic loci implicates genes and synaptic biology in schizophrenia.基因组定位研究提示精神分裂症的发病与基因及突触生物学有关。
Nature. 2022 Apr;604(7906):502-508. doi: 10.1038/s41586-022-04434-5. Epub 2022 Apr 8.
3
Fast Bayesian inference in large Gaussian graphical models.大型高斯图模型中的快速贝叶斯推理。
Biometrics. 2019 Dec;75(4):1288-1298. doi: 10.1111/biom.13064. Epub 2019 May 6.
4
Optimal Shrinkage of Eigenvalues in the Spiked Covariance Model.尖峰协方差模型中特征值的最优收缩
Ann Stat. 2018 Aug;46(4):1742-1778. doi: 10.1214/17-AOS1601. Epub 2018 Jun 27.
5
Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk.检测转录因子结合对多基因疾病风险的全基因组定向影响。
Nat Genet. 2018 Oct;50(10):1483-1493. doi: 10.1038/s41588-018-0196-7. Epub 2018 Sep 3.
6
Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies.利用全基因组关联研究的汇总统计信息对性状相关基因组区域进行精细定位的前景
Am J Hum Genet. 2017 Oct 5;101(4):539-551. doi: 10.1016/j.ajhg.2017.08.012. Epub 2017 Sep 21.
7
Use of Wishart Prior and Simple Extensions for Sparse Precision Matrix Estimation.使用威沙特先验及简单扩展进行稀疏精度矩阵估计。
PLoS One. 2016 Feb 1;11(2):e0148171. doi: 10.1371/journal.pone.0148171. eCollection 2016.
8
FINEMAP: efficient variable selection using summary data from genome-wide association studies.精细定位:利用全基因组关联研究的汇总数据进行高效变量选择。
Bioinformatics. 2016 May 15;32(10):1493-501. doi: 10.1093/bioinformatics/btw018. Epub 2016 Jan 14.
9
An Efficient Multiple-Testing Adjustment for eQTL Studies that Accounts for Linkage Disequilibrium between Variants.一种考虑变体间连锁不平衡的eQTL研究的高效多重检验调整方法。
Am J Hum Genet. 2016 Jan 7;98(1):216-24. doi: 10.1016/j.ajhg.2015.11.021. Epub 2015 Dec 31.
10
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.