Suppr超能文献

超级分区:R语言中快速、灵活且可解释的大规模数据约简

Super Partition: fast, flexible, and interpretable large-scale data reduction in R.

作者信息

Queen Katelyn J, Barrett Malcolm, Millstein Joshua

机构信息

Department of Population and Public Health Sciences, University of Southern California, Los Angeles, California, United States.

Department of Health Policy, Stanford University, Stanford, California, United States.

出版信息

PeerJ. 2025 Jan 27;13:e18580. doi: 10.7717/peerj.18580. eCollection 2025.

Abstract

MOTIVATION

As data sets increase in size and complexity with advancing technology, flexible and interpretable data reduction methods that quantify information preservation become increasingly important.

RESULTS

Super Partition is a large-scale approximation of the original Partition data reduction algorithm that allows the user to flexibly specify the minimum amount of information captured for each input feature. In an initial step, Genie, a fast, hierarchical clustering algorithm, forms a super-partition, thereby increasing the computational tractability by allowing Partition to be applied to the subsets. Applications to high dimensional data sets show scalability to hundreds of thousands of features with reasonable computation times.

AVAILABILITY AND IMPLEMENTATION

Super Partition is a new function within the partition R package, available on the CRAN repository (https://cran.r-project.org/web/packages/partition/index.html).

摘要

动机

随着技术的进步,数据集的规模和复杂性不断增加,能够量化信息保留的灵活且可解释的数据约简方法变得越来越重要。

结果

超级划分是原始划分数据约简算法的大规模近似方法,它允许用户灵活指定为每个输入特征捕获的最小信息量。在初始步骤中,一种快速的层次聚类算法Genie形成一个超级划分,从而通过允许将划分应用于子集来提高计算的易处理性。对高维数据集的应用表明,在合理的计算时间内,该方法可扩展到数十万特征。

可用性和实现

超级划分是partition R包中的一个新函数,可在CRAN存储库(https://cran.r-project.org/web/packages/partition/index.html)上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d445/11781262/26c36b7009c3/peerj-13-18580-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验