Suppr超能文献

一个用于生成研究论文摘要统计信息的开源Python包。

: An open source Python package for producing summary statistics for research papers.

作者信息

Pollard Tom J, Johnson Alistair E W, Raffa Jesse D, Mark Roger G

机构信息

Massachusetts Institute of Technology (MIT), MIT Laboratory for Computational Physiology, Cambridge, Massachusetts, USA.

出版信息

JAMIA Open. 2018 May 23;1(1):26-31. doi: 10.1093/jamiaopen/ooy012. eCollection 2018 Jul.

Abstract

OBJECTIVES

In quantitative research, understanding basic parameters of the study population is key for interpretation of the results. As a result, it is typical for the first table ("Table 1") of a research paper to include summary statistics for the study data. Our objectives are 2-fold. First, we seek to provide a simple, reproducible method for providing summary statistics for research papers in the Python programming language. Second, we seek to use the package to improve the quality of summary statistics reported in research papers.

MATERIALS AND METHODS

The package is developed following good practice guidelines for scientific computing and all code is made available under a permissive MIT License. A testing framework runs on a continuous integration server, helping to maintain code stability. Issues are tracked openly and public contributions are encouraged.

RESULTS

The software package automatically compiles summary statistics into publishable formats such as CSV, HTML, and LaTeX. An executable Jupyter Notebook demonstrates application of the package to a subset of data from the MIMIC-III database. Tests such as Tukey's rule for outlier detection and Hartigan's Dip Test for modality are computed to highlight potential issues in summarizing the data.

DISCUSSION AND CONCLUSION

We present open source software for researchers to facilitate carrying out reproducible studies in Python, an increasingly popular language in scientific research. The toolkit is intended to mature over time with community feedback and input. Development of a common tool for summarizing data may help to promote good practice when used as a supplement to existing guidelines and recommendations. We encourage use of tableone alongside other methods of descriptive statistics and, in particular, visualization to ensure appropriate data handling. We also suggest seeking guidance from a statistician when using for a research study, especially prior to submitting the study for publication.

摘要

目标

在定量研究中,了解研究人群的基本参数是解释结果的关键。因此,研究论文的第一个表格(“表1”)通常会包含研究数据的汇总统计信息。我们的目标有两个。首先,我们试图提供一种简单、可重复的方法,用Python编程语言为研究论文提供汇总统计信息。其次,我们试图使用该软件包提高研究论文中报告的汇总统计信息的质量。

材料与方法

该软件包是按照科学计算的良好实践指南开发的,所有代码都在宽松的MIT许可下提供。一个测试框架在持续集成服务器上运行,有助于保持代码稳定性。问题被公开跟踪,并鼓励公众做出贡献。

结果

该软件包会自动将汇总统计信息编译成可发布的格式,如CSV、HTML和LaTeX。一个可执行的Jupyter Notebook展示了该软件包在MIMIC-III数据库的一部分数据上的应用。会计算诸如用于异常值检测的Tukey规则和用于模态检测的Hartigan Dip检验等测试,以突出汇总数据时的潜在问题。

讨论与结论

我们为研究人员提供了开源软件,以方便在Python(一种在科研中越来越流行的语言)中进行可重复研究。该工具包旨在随着社区反馈和投入不断成熟。开发一个通用的数据汇总工具,作为现有指南和建议的补充使用时,可能有助于推广良好的实践。我们鼓励将TableOne与其他描述性统计方法一起使用,特别是可视化方法,以确保适当的数据处理。我们还建议在将其用于研究时,尤其是在提交研究以供发表之前,寻求统计学家的指导。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验