• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

2020年代计算环境中的高性能统计计算

High-Performance Statistical Computing in the Computing Environments of the 2020s.

作者信息

Ko Seyoon, Zhou Hua, Zhou Jin J, Won Joong-Ho

机构信息

Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, California 90095, USA.

Department of Medicine, UCLA David Geffen School of Medicine, Los Angeles, California 90095, USA, and Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, Arizona 85724, USA.

出版信息

Stat Sci. 2022 Nov;37(4):494-518. doi: 10.1214/21-sts835. Epub 2022 Oct 13.

DOI:10.1214/21-sts835
PMID:37168541
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10168006/
Abstract

Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud computing makes access to supercomputers affordable. Deep learning software libraries make programming statistical algorithms easy and enable users to write code once and run it anywhere-from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. Highlighting how these developments benefit statisticians, we review recent optimization algorithms that are useful for high-dimensional models and can harness the power of HPC. Code snippets are provided to demonstrate the ease of programming. We also provide an easy-to-use distributed matrix data structure suitable for HPC. Employing this data structure, we illustrate various statistical applications including large-scale positron emission tomography and -regularized Cox regression. Our examples easily scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, we analyze the onset of type-2 diabetes from the UK Biobank with 200,000 subjects and about 500,000 single nucleotide polymorphisms using the HPC -regularized Cox regression. Fitting this half-million-variate model takes less than 45 minutes and reconfirms known associations. To our knowledge, this is the first demonstration of the feasibility of penalized regression of survival outcomes at this scale.

摘要

在过去十年中,无论是硬件还是软件,技术进步都使高性能计算(HPC)的使用比以往任何时候都更加容易。我们从统计计算的角度回顾这些进展。云计算使使用超级计算机变得经济实惠。深度学习软件库使统计算法的编程变得容易,并能让用户编写一次代码就能在任何地方运行——从笔记本电脑到配备多个图形处理单元(GPU)的工作站,或云端的超级计算机。在强调这些发展如何使统计学家受益的同时,我们回顾了最近对高维模型有用且能利用HPC能力的优化算法。提供代码片段以展示编程的简易性。我们还提供了一种适用于HPC的易于使用的分布式矩阵数据结构。使用这种数据结构,我们展示了各种统计应用,包括大规模正电子发射断层扫描和 -正则化Cox回归。我们的示例可以轻松扩展到8-GPU工作站和云端的720-CPU核心集群。例如,我们使用HPC -正则化Cox回归分析了来自英国生物银行的20万名受试者和大约50万个单核苷酸多态性的2型糖尿病发病情况。拟合这个五十万变量的模型耗时不到45分钟,并再次证实了已知的关联。据我们所知,这是首次证明在这种规模下生存结果的惩罚回归的可行性。

相似文献

1
High-Performance Statistical Computing in the Computing Environments of the 2020s.2020年代计算环境中的高性能统计计算
Stat Sci. 2022 Nov;37(4):494-518. doi: 10.1214/21-sts835. Epub 2022 Oct 13.
2
TeraChem Cloud: A High-Performance Computing Service for Scalable Distributed GPU-Accelerated Electronic Structure Calculations.泰瑞化学云:用于可扩展分布式 GPU 加速电子结构计算的高性能计算服务。
J Chem Inf Model. 2020 Apr 27;60(4):2126-2137. doi: 10.1021/acs.jcim.9b01152. Epub 2020 Apr 20.
3
Edge, Fog, and Cloud Against Disease: The Potential of High-Performance Cloud Computing for Pharma Drug Discovery.边缘计算、雾计算和云计算对抗疾病:高性能云计算在制药药物发现中的潜力。
Methods Mol Biol. 2024;2716:181-202. doi: 10.1007/978-1-0716-3449-3_8.
4
High performance computing for deformable image registration: towards a new paradigm in adaptive radiotherapy.用于可变形图像配准的高性能计算:迈向自适应放射治疗的新范式。
Med Phys. 2008 Aug;35(8):3546-53. doi: 10.1118/1.2948318.
5
Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms.基于GPU的边缘计算平台上聚类算法的评估
Sensors (Basel). 2020 Nov 6;20(21):6335. doi: 10.3390/s20216335.
6
Accelerating epistasis analysis in human genetics with consumer graphics hardware.利用消费级图形硬件加速人类遗传学中的上位性分析。
BMC Res Notes. 2009 Jul 24;2:149. doi: 10.1186/1756-0500-2-149.
7
Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.利用GPU加速耦合簇计算:一种使用OpenMP指令在异构计算架构上实现密度拟合CCSD(T)方法的方案
J Chem Theory Comput. 2023 Nov 14;19(21):7640-7657. doi: 10.1021/acs.jctc.3c00876. Epub 2023 Oct 25.
8
A performance/cost evaluation for a GPU-based drug discovery application on volunteer computing.基于志愿计算的 GPU 药物发现应用的性能/成本评估。
Biomed Res Int. 2014;2014:474219. doi: 10.1155/2014/474219. Epub 2014 Jun 15.
9
Large-scale neural circuit mapping data analysis accelerated with the graphical processing unit (GPU).利用图形处理单元(GPU)加速大规模神经回路映射数据分析。
J Neurosci Methods. 2015 Jan 15;239:1-10. doi: 10.1016/j.jneumeth.2014.09.022. Epub 2014 Sep 30.
10
Accelerating single molecule localization microscopy through parallel processing on a high-performance computing cluster.通过在高性能计算集群上进行并行处理来加速单分子定位显微镜技术。
J Microsc. 2019 Feb;273(2):148-160. doi: 10.1111/jmi.12772. Epub 2018 Dec 3.

引用本文的文献

1
Biomass production, growth performance and character relationship of six varieties of Napier ( L schumach.) grass at Teppi south west Ethiopia.埃塞俄比亚西南部特皮地区六种象草(L schumach.)的生物量生产、生长性能及性状关系
Heliyon. 2024 Nov 19;10(23):e40528. doi: 10.1016/j.heliyon.2024.e40528. eCollection 2024 Dec 15.
2
Massive Parallelization of Massive Sample-size Survival Analysis.大规模样本量生存分析的大规模并行化
J Comput Graph Stat. 2024;33(1):289-302. doi: 10.1080/10618600.2023.2213279. Epub 2023 Jun 26.
3
Multivariate genome-wide association analysis by iterative hard thresholding.迭代硬阈值的多元全基因组关联分析。
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad193.
4
Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx.使用 ParProx 进行超高维组学数据的计算可扩展回归建模。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab256.
5
Modern simulation utilities for genetic analysis.现代遗传分析模拟工具。
BMC Bioinformatics. 2021 May 3;22(1):228. doi: 10.1186/s12859-021-04086-8.

本文引用的文献

1
Communication-Efficient Accurate Statistical Estimation.通信高效的精确统计估计
J Am Stat Assoc. 2023;118(542):1000-1010. doi: 10.1080/01621459.2021.1969238. Epub 2021 Sep 24.
2
Smooth Function Approximation by Deep Neural Networks with General Activation Functions.具有通用激活函数的深度神经网络对光滑函数的逼近
Entropy (Basel). 2019 Jun 26;21(7):627. doi: 10.3390/e21070627.
3
Proximal Distance Algorithms: Theory and Practice.近端距离算法:理论与实践
J Mach Learn Res. 2019 Apr;20.
4
Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps.利用高密度基因分型和胰岛特异性表观基因组图谱对 2 型糖尿病位点进行精细映射到单变体分辨率。
Nat Genet. 2018 Nov;50(11):1505-1513. doi: 10.1038/s41588-018-0241-6. Epub 2018 Oct 8.
5
A PET reconstruction formulation that enforces non-negativity in projection space for bias reduction in Y-90 imaging.一种 PET 重建公式,该公式在投影空间中强制执行非负性,以减少 Y-90 成像中的偏差。
Phys Med Biol. 2018 Feb 6;63(3):035042. doi: 10.1088/1361-6560/aaa71b.
6
The Ensembl Variant Effect Predictor.Ensembl变异效应预测器。
Genome Biol. 2016 Jun 6;17(1):122. doi: 10.1186/s13059-016-0974-4.
7
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.
8
UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.英国生物银行:一个用于识别多种中老年复杂疾病病因的开放获取资源。
PLoS Med. 2015 Mar 31;12(3):e1001779. doi: 10.1371/journal.pmed.1001779. eCollection 2015 Mar.
9
Distance majorization and its applications.距离优化及其应用。
Math Program. 2014 Aug 1;146:409-436. doi: 10.1007/s10107-013-0697-1.
10
Massive parallelization of serial inference algorithms for a complex generalized linear model.用于复杂广义线性模型的串行推理算法的大规模并行化。
ACM Trans Model Comput Simul. 2013 Jan;23(1). doi: 10.1145/2414416.2414791.