Department of Statistics and Applied Probability, University of California, Santa Barbara, CA 93106.
Department of Statistical Science, Fox School of Business, Temple University, Philadelphia, PA 19122;
Proc Natl Acad Sci U S A. 2020 Aug 11;117(32):19045-19053. doi: 10.1073/pnas.1815563117. Epub 2020 Jul 28.
Data analyses typically rely upon assumptions about the missingness mechanisms that lead to observed versus missing data, assumptions that are typically unassessable. We explore an approach where the joint distribution of observed data and missing data are specified in a nonstandard way. In this formulation, which traces back to a representation of the joint distribution of the data and missingness mechanism, apparently first proposed by J. W. Tukey, the modeling assumptions about the distributions are either assessable or are designed to allow relatively easy incorporation of substantive knowledge about the problem at hand, thereby offering a possibly realistic portrayal of the data, both observed and missing. We develop Tukey's representation for exponential-family models, propose a computationally tractable approach to inference in this class of models, and offer some general theoretical comments. We then illustrate the utility of this approach with an example in systems biology.
数据分析通常依赖于对导致观测数据和缺失数据的缺失机制的假设,而这些假设通常是不可评估的。我们探索了一种方法,其中观测数据和缺失数据的联合分布以非标准方式指定。在这种表述中,可以追溯到数据和缺失机制联合分布的表示,这显然是由 J.W. Tukey 首次提出的,关于分布的建模假设要么是可评估的,要么旨在允许相对容易地纳入手头问题的实质性知识,从而对观测数据和缺失数据进行可能现实的描述。我们为指数族模型开发了 Tukey 的表示,并提出了一种在这类模型中进行推理的计算上可行的方法,并提供了一些一般的理论评论。然后,我们通过系统生物学中的一个示例来说明这种方法的实用性。