Bourdais Théo, Batlle Pau, Yang Xianjin, Baptista Ricardo, Rouquette Nicolas, Owhadi Houman
Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125.
Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109.
Proc Natl Acad Sci U S A. 2024 Aug 6;121(32):e2403449121. doi: 10.1073/pnas.2403449121. Epub 2024 Aug 1.
Most problems within and beyond the scientific domain can be framed into one of the following three levels of complexity of function approximation. Type 1: Approximate an unknown function given input/output data. Type 2: Consider a collection of variables and functions, some of which are unknown, indexed by the nodes and hyperedges of a hypergraph (a generalized graph where edges can connect more than two vertices). Given partial observations of the variables of the hypergraph (satisfying the functional dependencies imposed by its structure), approximate all the unobserved variables and unknown functions. Type 3: Expanding on Type 2, if the hypergraph structure itself is unknown, use partial observations of the variables of the hypergraph to discover its structure and approximate its unknown functions. These hypergraphs offer a natural platform for organizing, communicating, and processing computational knowledge. While most scientific problems can be framed as the data-driven discovery of unknown functions in a computational hypergraph whose structure is known (Type 2), many require the data-driven discovery of the structure (connectivity) of the hypergraph itself (Type 3). We introduce an interpretable Gaussian Process (GP) framework for such (Type 3) problems that does not require randomization of the data, access to or control over its sampling, or sparsity of the unknown functions in a known or learned basis. Its polynomial complexity, which contrasts sharply with the super-exponential complexity of causal inference methods, is enabled by the nonlinear ANOVA capabilities of GPs used as a sensing mechanism.
科学领域内外的大多数问题都可以归纳为以下三种函数近似复杂度级别之一。类型1:根据输入/输出数据近似未知函数。类型2:考虑一组变量和函数,其中一些是未知的,由超图(一种广义图,其中边可以连接两个以上顶点)的节点和超边索引。给定超图变量的部分观测值(满足其结构所施加的函数依赖性),近似所有未观测到的变量和未知函数。类型3:在类型2的基础上进行扩展,如果超图结构本身未知,则利用超图变量的部分观测值来发现其结构并近似其未知函数。这些超图为组织、交流和处理计算知识提供了一个自然的平台。虽然大多数科学问题可以被构建为在已知结构的计算超图中进行数据驱动的未知函数发现(类型2),但许多问题需要数据驱动的超图本身结构(连通性)发现(类型3)。我们为这类(类型3)问题引入了一个可解释的高斯过程(GP)框架,该框架不需要对数据进行随机化处理,不需要访问或控制其采样,也不需要在已知或学习的基中使未知函数稀疏化。其多项式复杂度与因果推断方法的超指数复杂度形成鲜明对比,这是通过将高斯过程用作传感机制的非线性方差分析能力实现的。