单变量描述和双变量统计推断：深入数据的第一步。

Univariate description and bivariate statistical inference: the first step delving into data.

机构信息

Department of Critical Care Medicine, Jinhua Municipal Central Hospital, Jinhua Hospital of Zhejiang University, Jinhua 321000, China.

出版信息

Ann Transl Med. 2016 Mar;4(5):91. doi: 10.21037/atm.2016.02.11.

DOI:10.21037/atm.2016.02.11

PMID:27047950

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4791343/

Abstract

In observational studies, the first step is usually to explore data distribution and the baseline differences between groups. Data description includes their central tendency (e.g., mean, median, and mode) and dispersion (e.g., standard deviation, range, interquartile range). There are varieties of bivariate statistical inference methods such as Student's t-test, Mann-Whitney U test and Chi-square test, for normal, skews and categorical data, respectively. The article shows how to perform these analyses with R codes. Furthermore, I believe that the automation of the whole workflow is of paramount importance in that (I) it allows for others to repeat your results; (II) you can easily find out how you performed analysis during revision; (III) it spares data input by hand and is less error-prone; and (IV) when you correct your original dataset, the final result can be automatically corrected by executing the codes. Therefore, the process of making a publication quality table incorporating all abovementioned statistics and P values is provided, allowing readers to customize these codes to their own needs.

摘要

在观察性研究中，通常第一步是探索数据分布和组间的基线差异。数据描述包括其集中趋势（例如均值、中位数和众数）和离散程度（例如标准差、范围、四分位距）。有各种双变量统计推断方法，分别适用于正态分布、偏态分布和分类数据，例如学生 t 检验、曼-惠特尼 U 检验和卡方检验。本文展示了如何使用 R 代码执行这些分析。此外，我认为整个工作流程的自动化非常重要，因为：(i) 它允许其他人重复您的结果；(ii) 您可以在修订过程中轻松找出进行分析的方法；(iii) 它避免了手动输入数据，减少出错的可能性；(iv) 当您更正原始数据集时，可以通过执行代码自动更正最终结果。因此，提供了一个包含所有上述统计信息和 P 值的具有出版质量的表格的制作过程，允许读者根据自己的需求自定义这些代码。

相似文献

Univariate description and bivariate statistical inference: the first step delving into data.单变量描述和双变量统计推断：深入数据的第一步。

Ann Transl Med. 2016 Mar;4(5):91. doi: 10.21037/atm.2016.02.11.

[Meta-analysis of the Italian studies on short-term effects of air pollution].[意大利关于空气污染短期影响研究的荟萃分析]

Epidemiol Prev. 2001 Mar-Apr;25(2 Suppl):1-71.

Erratum: Preparation of Poly(pentafluorophenyl acrylate) Functionalized SiO2 Beads for Protein Purification.勘误：用于蛋白质纯化的聚（丙烯酸五氟苯酯）功能化二氧化硅微珠的制备

J Vis Exp. 2019 Apr 30(146). doi: 10.3791/6328.

Descriptive and inferential statistical methods used in burns research.烧伤研究中使用的描述性和推断性统计方法。

Burns. 2010 May;36(3):343-6. doi: 10.1016/j.burns.2009.04.030. Epub 2009 Jun 21.

Comparison between one day and two days protocols for sentinel node mapping of breast cancer patients.乳腺癌患者前哨淋巴结定位的一日方案与两日方案的比较。

Hell J Nucl Med. 2011 Sep-Dec;14(3):313-5.

Unadjusted Bivariate Two-Group Comparisons: When Simpler is Better.未调整的双变量两组比较：简单即更好。

Anesth Analg. 2018 Jan;126(1):338-342. doi: 10.1213/ANE.0000000000002636.

Trends in statistical methods in articles published in Archives of Plastic Surgery between 2012 and 2017.2012年至2017年发表于《整形外科学档案》上文章的统计方法趋势

Arch Plast Surg. 2018 May;45(3):207-213. doi: 10.5999/aps.2018.00010. Epub 2018 May 15.

Statistics for the nonstatistician: Part I.面向非统计专业人员的统计学：第一部分。

South Med J. 2012 Mar;105(3):126-30. doi: 10.1097/SMJ.0b013e3182498ad5.

Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent.与金标准数据集差异表达检验相对应的假定零分布是强度依赖性的。

BMC Genomics. 2007 Apr 19;8:105. doi: 10.1186/1471-2164-8-105.

Univariate and bivariate likelihood-based meta-analysis methods performed comparably when marginal sensitivity and specificity were the targets of inference.当边缘敏感性和特异性是推断目标时，单变量和双变量似然比荟萃分析方法的性能相当。

J Clin Epidemiol. 2017 Mar;83:8-17. doi: 10.1016/j.jclinepi.2016.12.003. Epub 2017 Jan 4.

引用本文的文献

Evaluating In-Hospital Arrhythmias in Critically Ill Acute Kidney Injury Patients: Predictive Models, Mortality Risks, and the Efficacy of Antiarrhythmic Drugs.评估危重症急性肾损伤患者的院内心律失常：预测模型、死亡风险及抗心律失常药物的疗效

J Clin Med. 2025 Jun 26;14(13):4552. doi: 10.3390/jcm14134552.

Machine learning-based prediction of mortality risk in AIDS patients with comorbid common AIDS-related diseases or symptoms.基于机器学习对合并常见艾滋病相关疾病或症状的艾滋病患者死亡风险的预测

Front Public Health. 2025 Mar 12;13:1544351. doi: 10.3389/fpubh.2025.1544351. eCollection 2025.

Distant metastasis patterns among lung cancer subtypes and impact of primary tumor resection on survival in metastatic lung cancer using SEER database.基于 SEER 数据库的肺癌亚型远处转移模式和原发肿瘤切除对转移性肺癌患者生存的影响。

Sci Rep. 2024 Sep 28;14(1):22445. doi: 10.1038/s41598-024-73389-6.

A cross-sectional examination of the relationship between learning environment and anxiety among dental hygiene students.一项关于口腔卫生专业学生学习环境与焦虑之间关系的横断面调查。

J Dent Educ. 2025 Jan;89(1):17-24. doi: 10.1002/jdd.13694. Epub 2024 Aug 23.

Establishing a risk prediction model for residual pulmonary vascular obstruction after regular anticoagulant therapy for non-high-risk pulmonary embolism.建立非高危肺栓塞常规抗凝治疗后残余肺血管阻塞的风险预测模型。

J Thorac Dis. 2024 Jul 30;16(7):4447-4459. doi: 10.21037/jtd-23-1876. Epub 2024 Jul 25.

Establishment of a Risk Prediction Model for Metabolic Syndrome in High Altitude Areas in Qinghai Province, China: A Cross-Sectional Study.中国青海省高海拔地区代谢综合征风险预测模型的建立：一项横断面研究

Diabetes Metab Syndr Obes. 2024 May 17;17:2041-2052. doi: 10.2147/DMSO.S445650. eCollection 2024.

Associations between life's essential 8 and metabolic health among us adults: insights of NHANES from 2005 to 2018.生命必需 8 项与美国成年人代谢健康之间的关联：来自 2005 至 2018 年 NHANES 的见解。

Acta Diabetol. 2024 Aug;61(8):963-974. doi: 10.1007/s00592-024-02277-2. Epub 2024 Apr 7.

Hypermagnesaemia, but Not Hypomagnesaemia, Is a Predictor of Inpatient Mortality in Critically Ill Children with Sepsis.高镁血症而非低镁血症是脓毒症危重症患儿住院死亡率的预测因素。

Dis Markers. 2022 Jan 27;2022:3893653. doi: 10.1155/2022/3893653. eCollection 2022.

Statin therapy associated with decreased neuronal injury measured by serum S100β levels in patients with acute ischemic stroke.在急性缺血性中风患者中，他汀类药物治疗与通过血清S100β水平测量的神经元损伤减少相关。

Int J Crit Illn Inj Sci. 2021 Oct-Dec;11(4):246-252. doi: 10.4103/IJCIIS.IJCIIS_7_20. Epub 2021 Dec 18.

Comparison of the clinical characteristics and mortalities of severe COVID-19 patients between pre- and post-menopause women and age-matched men.绝经前和绝经后女性与年龄匹配男性的严重 COVID-19 患者的临床特征和死亡率比较。

Aging (Albany NY). 2021 Sep 22;13(18):21903-21913. doi: 10.18632/aging.203532.

本文引用的文献

Missing values in big data research: some basic skills.大数据研究中的缺失值：一些基本技巧。

Ann Transl Med. 2015 Dec;3(21):323. doi: 10.3978/j.issn.2305-5839.2015.12.11.

Data management by using R: big data clinical research series.使用 R 进行数据管理：大数据临床研究系列。

Ann Transl Med. 2015 Nov;3(20):303. doi: 10.3978/j.issn.2305-5839.2015.11.26.

Normality tests for statistical analysis: a guide for non-statisticians.用于统计分析的正态性检验：非统计学家指南。

Int J Endocrinol Metab. 2012 Spring;10(2):486-9. doi: 10.5812/ijem.3505. Epub 2012 Apr 20.

Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules.威尔科克森-曼-惠特尼检验还是t检验？关于假设检验的假设以及决策规则的多种解释。

Stat Surv. 2010;4:1-39. doi: 10.1214/09-SS051.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验