Kedem Benjamin, Pyne Saumyadipta
Department of Mathematics, Institute for Systems Research, University of Maryland, College Park, MD USA.
Public Health Dynamics Laboratory, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA USA.
J Stat Theory Pract. 2021;15(2):25. doi: 10.1007/s42519-020-00152-1. Epub 2021 Jan 20.
Synthetic data, when properly used, can enhance patterns in real data and thus provide insights into different problems. Here, the estimation of tail probabilities of rare events from a moderately large number of observations is considered. The problem is approached by a large number of augmentations or fusions of the real data with computer-generated synthetic samples. The tail probability of interest is approximated by subsequences created by a novel iterative process. The estimates are found to be quite precise.
合成数据若使用得当,可增强真实数据中的模式,从而为不同问题提供见解。在此,考虑从适度大量的观测值估计罕见事件的尾部概率。该问题通过对真实数据与计算机生成的合成样本进行大量扩充或融合来解决。感兴趣的尾部概率由一个新颖的迭代过程创建的子序列近似。结果发现这些估计相当精确。