来自datasets包的数据的描述性统计和可视化及其对可聚类性的影响。

Descriptive statistics and visualization of data from the datasets package with implications for clusterability.

作者信息

Brownstein Naomi C, Adolfsson Andreas, Ackerman Margareta

机构信息

Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, 12902 USF Magnolia Drive, Tampa, FL, 32612, USA.

Department of Behavioral Sciences and Social Medicine, Florida State University, 1115 West Call Street, Tallahassee, FL, 32306-4300, USA.

出版信息

Data Brief. 2019 May 24;25:104004. doi: 10.1016/j.dib.2019.104004. eCollection 2019 Aug.

DOI:10.1016/j.dib.2019.104004

PMID:31317060

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6612012/

Abstract

The manuscript describes and visualizes datasets from the package in the statistical software, focusing on descriptive statistics and visualizations that provide insights into the clusterability of these datasets. These publicly available datasets are contained in the software system, and can be downloaded at https://www.r-project.org/, with documentation provided at https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html. Further information on clusterability is found in the companion to this article, ? (https://doi.org/10.1016/j.patcog.2018.10.026). Brief descriptions and graphs of the variables contained in each dataset are provided in the form of means, extrema, quartiles, standard deviation and standard error. Two-dimensional plots for each pair of variables are provided. Original references to the data sets are included when available. Further, each dataset is reduced to a single dimension by each of two different methods: pairwise distances and principal component analysis. For the latter, only the first component is used. Histograms of the reduced data are included for every dataset using both methods.

摘要

该手稿描述并可视化了统计软件中该包的数据集，重点关注描述性统计和可视化，以深入了解这些数据集的可聚类性。这些公开可用的数据集包含在该软件系统中，可在https://www.r-project.org/下载，文档可在https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html获取。关于可聚类性的更多信息可在本文的配套文章中找到，？(https://doi.org/10.1016/j.patcog.2018.10.026)。每个数据集中包含的变量的简要描述和图表以均值、极值、四分位数、标准差和标准误差的形式提供。提供了每对变量的二维图。如有可用，还包括数据集的原始参考文献。此外，每个数据集通过两种不同的方法分别降维到一维：成对距离法和主成分分析法。对于后者，仅使用第一个成分。使用这两种方法为每个数据集包含了降维数据的直方图。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9bea/6612012/9b65a28a4dc0/gr1.jpg

相似文献

Descriptive statistics and visualization of data from the datasets package with implications for clusterability.来自datasets包的数据的描述性统计和可视化及其对可聚类性的影响。

Data Brief. 2019 May 24;25:104004. doi: 10.1016/j.dib.2019.104004. eCollection 2019 Aug.

Sparse clusterability: testing for cluster structure in high dimensions.稀疏聚类性：高维数据中的聚类结构检验。

BMC Bioinformatics. 2023 Mar 31;24(1):125. doi: 10.1186/s12859-023-05210-6.

Response to letter to the editor from Dr Rahman Shiri: The challenging topic of suicide across occupational groups.回复拉赫曼·希里博士的来信：职业群体中的自杀这一具有挑战性的话题。

Scand J Work Environ Health. 2018 Jan 1;44(1):108-110. doi: 10.5271/sjweh.3698. Epub 2017 Dec 8.

aPCoA: covariate adjusted principal coordinates analysis.aPCoA：协变量调整主坐标分析。

Bioinformatics. 2020 Jul 1;36(13):4099-4101. doi: 10.1093/bioinformatics/btaa276.

iSFun: an R package for integrative dimension reduction analysis.iSFun：一个用于整合维度缩减分析的 R 包。

Bioinformatics. 2022 May 26;38(11):3134-3135. doi: 10.1093/bioinformatics/btac281.

bigPint: A Bioconductor visualization package that makes big data pint-sized.bigPint：一个 Bioconductor 可视化程序包，可让大数据变得微不足道。

PLoS Comput Biol. 2020 Jun 15;16(6):e1007912. doi: 10.1371/journal.pcbi.1007912. eCollection 2020 Jun.

Clusterability and Clustering of Images and Other "Real" High-Dimensional Data.图像及其他“真实”高维数据的可聚类性与聚类

IEEE Trans Image Process. 2018 Apr;27(4):1927-1938. doi: 10.1109/TIP.2017.2789327.

netReg: network-regularized linear models for biological association studies.netReg：用于生物关联研究的网络正则化线性模型。

Bioinformatics. 2018 Mar 1;34(5):896-898. doi: 10.1093/bioinformatics/btx677.

TFEA.ChIP: a tool kit for transcription factor binding site enrichment analysis capitalizing on ChIP-seq datasets.TFEA.ChIP：一个转录因子结合位点富集分析工具包，利用 ChIP-seq 数据集。

Bioinformatics. 2019 Dec 15;35(24):5339-5340. doi: 10.1093/bioinformatics/btz573.

Regulatory motif finding by logic regression.通过逻辑回归进行调控基序发现。

Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27.

引用本文的文献

Genome-wide association study of the loci and candidate genes associated with agronomic traits in Amomum villosum Lour.全基因组关联研究与 Amomum villosum Lour. 农艺性状相关的基因座和候选基因

PLoS One. 2024 Aug 5;19(8):e0306806. doi: 10.1371/journal.pone.0306806. eCollection 2024.

Sparse clusterability: testing for cluster structure in high dimensions.稀疏聚类性：高维数据中的聚类结构检验。

BMC Bioinformatics. 2023 Mar 31;24(1):125. doi: 10.1186/s12859-023-05210-6.

A meta-analysis of the effects of clay mineral supplementation on alkaline phosphatase, broiler health, and performance.黏土矿物补充剂对碱性磷酸酶、肉鸡健康和生产性能影响的荟萃分析

Poult Sci. 2023 Mar;102(3):102456. doi: 10.1016/j.psj.2022.102456. Epub 2022 Dec 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

来自datasets包的数据的描述性统计和可视化及其对可聚类性的影响。

Descriptive statistics and visualization of data from the datasets package with implications for clusterability.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献