一个用于生成研究论文摘要统计信息的开源Python包。

: An open source Python package for producing summary statistics for research papers.

作者信息

Pollard Tom J, Johnson Alistair E W, Raffa Jesse D, Mark Roger G

机构信息

Massachusetts Institute of Technology (MIT), MIT Laboratory for Computational Physiology, Cambridge, Massachusetts, USA.

出版信息

JAMIA Open. 2018 May 23;1(1):26-31. doi: 10.1093/jamiaopen/ooy012. eCollection 2018 Jul.

DOI:10.1093/jamiaopen/ooy012

PMID:31984317

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6951995/

Abstract

OBJECTIVES

In quantitative research, understanding basic parameters of the study population is key for interpretation of the results. As a result, it is typical for the first table ("Table 1") of a research paper to include summary statistics for the study data. Our objectives are 2-fold. First, we seek to provide a simple, reproducible method for providing summary statistics for research papers in the Python programming language. Second, we seek to use the package to improve the quality of summary statistics reported in research papers.

MATERIALS AND METHODS

The package is developed following good practice guidelines for scientific computing and all code is made available under a permissive MIT License. A testing framework runs on a continuous integration server, helping to maintain code stability. Issues are tracked openly and public contributions are encouraged.

RESULTS

The software package automatically compiles summary statistics into publishable formats such as CSV, HTML, and LaTeX. An executable Jupyter Notebook demonstrates application of the package to a subset of data from the MIMIC-III database. Tests such as Tukey's rule for outlier detection and Hartigan's Dip Test for modality are computed to highlight potential issues in summarizing the data.

DISCUSSION AND CONCLUSION

We present open source software for researchers to facilitate carrying out reproducible studies in Python, an increasingly popular language in scientific research. The toolkit is intended to mature over time with community feedback and input. Development of a common tool for summarizing data may help to promote good practice when used as a supplement to existing guidelines and recommendations. We encourage use of tableone alongside other methods of descriptive statistics and, in particular, visualization to ensure appropriate data handling. We also suggest seeking guidance from a statistician when using for a research study, especially prior to submitting the study for publication.

摘要

目标

在定量研究中，了解研究人群的基本参数是解释结果的关键。因此，研究论文的第一个表格（“表1”）通常会包含研究数据的汇总统计信息。我们的目标有两个。首先，我们试图提供一种简单、可重复的方法，用Python编程语言为研究论文提供汇总统计信息。其次，我们试图使用该软件包提高研究论文中报告的汇总统计信息的质量。

材料与方法

该软件包是按照科学计算的良好实践指南开发的，所有代码都在宽松的MIT许可下提供。一个测试框架在持续集成服务器上运行，有助于保持代码稳定性。问题被公开跟踪，并鼓励公众做出贡献。

结果

该软件包会自动将汇总统计信息编译成可发布的格式，如CSV、HTML和LaTeX。一个可执行的Jupyter Notebook展示了该软件包在MIMIC-III数据库的一部分数据上的应用。会计算诸如用于异常值检测的Tukey规则和用于模态检测的Hartigan Dip检验等测试，以突出汇总数据时的潜在问题。

讨论与结论

我们为研究人员提供了开源软件，以方便在Python（一种在科研中越来越流行的语言）中进行可重复研究。该工具包旨在随着社区反馈和投入不断成熟。开发一个通用的数据汇总工具，作为现有指南和建议的补充使用时，可能有助于推广良好的实践。我们鼓励将TableOne与其他描述性统计方法一起使用，特别是可视化方法，以确保适当的数据处理。我们还建议在将其用于研究时，尤其是在提交研究以供发表之前，寻求统计学家的指导。

相似文献

: An open source Python package for producing summary statistics for research papers.一个用于生成研究论文摘要统计信息的开源Python包。

JAMIA Open. 2018 May 23;1(1):26-31. doi: 10.1093/jamiaopen/ooy012. eCollection 2018 Jul.

TableOne: an online web application and R package for summarising and visualising data.表一：一个在线网络应用程序和 R 包，用于总结和可视化数据。

Evid Based Ment Health. 2020 Aug;23(3):127-130. doi: 10.1136/ebmental-2020-300162. Epub 2020 Jul 14.

NeuroPycon: An open-source python toolbox for fast multi-modal and reproducible brain connectivity pipelines.NeuroPycon：一个开源的 Python 工具包，用于快速进行多模态和可重复的脑连接管道。

Neuroimage. 2020 Oct 1;219:117020. doi: 10.1016/j.neuroimage.2020.117020. Epub 2020 Jun 6.

Nmrglue: an open source Python package for the analysis of multidimensional NMR data.Nmrglue：一个用于分析多维 NMR 数据的开源 Python 包。

J Biomol NMR. 2013 Apr;55(4):355-67. doi: 10.1007/s10858-013-9718-x. Epub 2013 Mar 2.

Experimenting with reproducibility: a case study of robustness in bioinformatics.实验可重复性：生物信息学稳健性的案例研究。

Gigascience. 2018 Jul 1;7(7). doi: 10.1093/gigascience/giy077.

Neurophysiological analytics for all! Free open-source software tools for documenting, analyzing, visualizing, and sharing using electronic notebooks.面向所有人的神经生理学分析！用于使用电子笔记本进行记录、分析、可视化和共享的免费开源软件工具。

J Neurophysiol. 2016 Aug 1;116(2):252-62. doi: 10.1152/jn.00137.2016. Epub 2016 Apr 20.

Publishing computational research - a review of infrastructures for reproducible and transparent scholarly communication.发表计算研究——关于可重复和透明学术交流基础设施的综述

Res Integr Peer Rev. 2020 Jul 14;5:10. doi: 10.1186/s41073-020-00095-y. eCollection 2020.

Promoting and supporting self-management for adults living in the community with physical chronic illness: A systematic review of the effectiveness and meaningfulness of the patient-practitioner encounter.促进和支持社区中患有慢性身体疾病的成年人进行自我管理：对医患互动的有效性和意义的系统评价。

JBI Libr Syst Rev. 2009;7(13):492-582. doi: 10.11124/01938924-200907130-00001.

NGLview-interactive molecular graphics for Jupyter notebooks.NGLview——适用于 Jupyter 笔记本的交互式分子图形。

Bioinformatics. 2018 Apr 1;34(7):1241-1242. doi: 10.1093/bioinformatics/btx789.

MEG and EEG data analysis with MNE-Python.使用 MNE-Python 进行 MEG 和 EEG 数据分析。

Front Neurosci. 2013 Dec 26;7:267. doi: 10.3389/fnins.2013.00267.

引用本文的文献

Impact of regular televisits on unplanned hospital admissions of nursing home residents in rural Germany: a pre-post intervention study.定期视频问诊对德国农村地区养老院居民非计划住院率的影响：一项干预前后对照研究。

BMC Geriatr. 2025 Sep 8;25(1):687. doi: 10.1186/s12877-025-06244-6.

Hemispheric asymmetry of tau pathology is related to asymmetric amyloid deposition in Alzheimer's Disease.在阿尔茨海默病中，tau蛋白病理的半球不对称与淀粉样蛋白沉积的不对称有关。

Nat Commun. 2025 Sep 5;16(1):8232. doi: 10.1038/s41467-025-63564-2.

The Influence of Uric Acid Concentration on the Daily Functioning of Patients at an Advanced Age, Based on the Results of Selected Point Scales Routinely Used for the Comprehensive Geriatric Assessment in Poland.基于波兰综合老年评估中常规使用的选定评分量表结果，探讨尿酸浓度对高龄患者日常功能的影响。

J Clin Med. 2025 Aug 15;14(16):5793. doi: 10.3390/jcm14165793.

Higher uric acid associated with elevated IL‑6 and IL‑1β levels in older inpatients: a cross‑sectional study.老年住院患者中高尿酸与白细胞介素-6和白细胞介素-1β水平升高相关：一项横断面研究。

Rheumatol Int. 2025 Aug 1;45(8):177. doi: 10.1007/s00296-025-05931-2.

Epigenomic diagnosis and prognosis of Acute Myeloid Leukemia.急性髓系白血病的表观基因组诊断与预后

Nat Commun. 2025 Jul 29;16(1):6961. doi: 10.1038/s41467-025-62005-4.

Auxiliary Diagnosis of Pulmonary Nodules' Benignancy and Malignancy Based on Machine Learning: A Retrospective Study.基于机器学习的肺结节良恶性辅助诊断：一项回顾性研究

J Multidiscip Healthc. 2025 Jun 27;18:3735-3748. doi: 10.2147/JMDH.S518166. eCollection 2025.

Short-term mortality after opioid initiation among opioid-naïve and non-naïve patients with dementia: a retrospective cohort study.初用阿片类药物的痴呆患者与非初用阿片类药物的痴呆患者起始使用阿片类药物后的短期死亡率：一项回顾性队列研究

BMC Med. 2025 Jun 9;23(1):340. doi: 10.1186/s12916-025-04172-1.

Early prediction of colorectal adenoma risk: leveraging large-language model for clinical electronic medical record data.结直肠腺瘤风险的早期预测：利用大语言模型处理临床电子病历数据

Front Oncol. 2025 May 15;15:1508455. doi: 10.3389/fonc.2025.1508455. eCollection 2025.

Characterizing the metabolome of children with growth hormone deficiency.表征生长激素缺乏症儿童的代谢组

J Pediatr Endocrinol Metab. 2025 May 26. doi: 10.1515/jpem-2025-0098.

Deep Learning-based Time-to-event Analysis of Depression and Asthma using the All of Us Research Program.利用全民研究计划基于深度学习的抑郁症和哮喘事件发生时间分析

AMIA Annu Symp Proc. 2025 May 22;2024:1186-1195. eCollection 2024.

本文引用的文献

Tranexamic Acid in Patients Undergoing Coronary-Artery Surgery.接受冠状动脉手术患者使用氨甲环酸的情况。

N Engl J Med. 2018 Feb 22;378(8):782. doi: 10.1056/NEJMx180005.

The MIMIC Code Repository: enabling reproducibility in critical care research.MIMIC 代码库：实现重症监护研究的可重复性。

J Am Med Inform Assoc. 2018 Jan 1;25(1):32-39. doi: 10.1093/jamia/ocx084.

Good enough practices in scientific computing.科学计算中的良好实践。

PLoS Comput Biol. 2017 Jun 22;13(6):e1005510. doi: 10.1371/journal.pcbi.1005510. eCollection 2017 Jun.

MIMIC-III, a freely accessible critical care database.MIMIC-III，一个免费获取的重症监护数据库。

Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

Programming: Pick up Python.编程：学习Python。

Nature. 2015 Feb 5;518(7537):125-6. doi: 10.1038/518125a.

Basic statistical reporting for articles published in biomedical journals: the "Statistical Analyses and Methods in the Published Literature" or the SAMPL Guidelines.生物医学期刊发表文章的基本统计报告：《已发表文献中的统计分析与方法》或SAMPL指南。

Int J Nurs Stud. 2015 Jan;52(1):5-9. doi: 10.1016/j.ijnurstu.2014.09.006. Epub 2014 Sep 28.

Some common misperceptions about P values.关于P值的一些常见误解。

Stroke. 2014 Dec;45(12):e244-6. doi: 10.1161/STROKEAHA.114.006138. Epub 2014 Nov 6.

CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials.CONSORT 2010 声明：平行组随机试验报告的更新指南。

BMJ. 2010 Mar 23;340:c332. doi: 10.1136/bmj.c332.

Statistical reviewing policies of medical journals: caveat lector?医学期刊的统计审查政策：读者需谨慎？

J Gen Intern Med. 1998 Nov;13(11):753-6. doi: 10.1046/j.1525-1497.1998.00227.x.

Ten rules for reading clinical research reports.阅读临床研究报告的十条规则。

Am J Orthod Dentofacial Orthop. 1996 May;109(5):558-64. doi: 10.1016/s0889-5406(96)70143-9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一个用于生成研究论文摘要统计信息的开源Python包。

: An open source Python package for producing summary statistics for research papers.

作者信息

机构信息

出版信息

OBJECTIVES

MATERIALS AND METHODS

RESULTS

DISCUSSION AND CONCLUSION

目标

材料与方法

结果

讨论与结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献