Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada.
Stat Med. 2024 Apr 30;43(9):1804-1825. doi: 10.1002/sim.10012. Epub 2024 Feb 14.
Statistical data simulation is essential in the development of statistical models and methods as well as in their performance evaluation. To capture complex data structures, in particular for high-dimensional data, a variety of simulation approaches have been introduced including parametric and the so-called plasmode simulations. While there are concerns about the realism of parametrically simulated data, it is widely claimed that plasmodes come very close to reality with some aspects of the "truth" known. However, there are no explicit guidelines or state-of-the-art on how to perform plasmode data simulations. In the present paper, we first review existing literature and introduce the concept of statistical plasmode simulation. We then discuss advantages and challenges of statistical plasmodes and provide a step-wise procedure for their generation, including key steps to their implementation and reporting. Finally, we illustrate the concept of statistical plasmodes as well as the proposed plasmode generation procedure by means of a public real RNA data set on breast carcinoma patients.
统计数据模拟在统计模型和方法的开发以及其性能评估中至关重要。为了捕捉复杂的数据结构,特别是对于高维数据,已经引入了各种模拟方法,包括参数模拟和所谓的质体模拟。虽然人们对参数模拟数据的真实性存在担忧,但广泛认为质体非常接近现实,并且某些方面的“真相”是已知的。然而,目前还没有关于如何进行质体数据模拟的明确指南或最先进的方法。在本文中,我们首先回顾了现有文献,并介绍了统计质体模拟的概念。然后,我们讨论了统计质体的优点和挑战,并提供了生成它们的逐步过程,包括实现和报告的关键步骤。最后,我们通过一个关于乳腺癌患者的公共真实 RNA 数据集来说明统计质体的概念以及提出的质体生成过程。