Brownstein Naomi C, Adolfsson Andreas, Ackerman Margareta
Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, 12902 USF Magnolia Drive, Tampa, FL, 32612, USA.
Department of Behavioral Sciences and Social Medicine, Florida State University, 1115 West Call Street, Tallahassee, FL, 32306-4300, USA.
Data Brief. 2019 May 24;25:104004. doi: 10.1016/j.dib.2019.104004. eCollection 2019 Aug.
The manuscript describes and visualizes datasets from the package in the statistical software, focusing on descriptive statistics and visualizations that provide insights into the clusterability of these datasets. These publicly available datasets are contained in the software system, and can be downloaded at https://www.r-project.org/, with documentation provided at https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html. Further information on clusterability is found in the companion to this article, ? (https://doi.org/10.1016/j.patcog.2018.10.026). Brief descriptions and graphs of the variables contained in each dataset are provided in the form of means, extrema, quartiles, standard deviation and standard error. Two-dimensional plots for each pair of variables are provided. Original references to the data sets are included when available. Further, each dataset is reduced to a single dimension by each of two different methods: pairwise distances and principal component analysis. For the latter, only the first component is used. Histograms of the reduced data are included for every dataset using both methods.
该手稿描述并可视化了统计软件中该包的数据集,重点关注描述性统计和可视化,以深入了解这些数据集的可聚类性。这些公开可用的数据集包含在该软件系统中,可在https://www.r-project.org/下载,文档可在https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html获取。关于可聚类性的更多信息可在本文的配套文章中找到,?(https://doi.org/10.1016/j.patcog.2018.10.026)。每个数据集中包含的变量的简要描述和图表以均值、极值、四分位数、标准差和标准误差的形式提供。提供了每对变量的二维图。如有可用,还包括数据集的原始参考文献。此外,每个数据集通过两种不同的方法分别降维到一维:成对距离法和主成分分析法。对于后者,仅使用第一个成分。使用这两种方法为每个数据集包含了降维数据的直方图。