Columbia University, New York, NY, USA.
Fourier Genetics, Austin, TX, USA.
J Theor Biol. 2022 May 7;540:110985. doi: 10.1016/j.jtbi.2021.110985. Epub 2021 Dec 23.
This paper explores the genotype-phenotype relationship. It outlines conditions under which the dependence of a quantitative trait on the genome might be predictable, based on measurement of a limited subset of genotypes. It uses the theory of real-valued Boolean functions in a systematic way to translate trait data into the Fourier domain. Important trait features, such as the roughness of the trait landscape or the modularity of a trait have a simple Fourier interpretation. Ruggedness at a gene location corresponds to high sensitivity to mutation, while a modular organization of gene activity reduces such sensitivity. Traits where rugged loci are rare will naturally compress gene data in the Fourier domain, leading to a sparse representation of trait data, concentrated in identifiable, low-level coefficients. This Fourier representation of a trait organizes epistasis in a form which is isometric to the trait data. As Fourier matrices are known to be maximally incoherent with the standard basis, this permits employing compressive sensing techniques to work from data sets that are relatively small-sometimes even of polynomial size-compared to the exponentially large sets of possible genomes. This theory provides a theoretical underpinning for systematic use of Boolean function machinery to dissect the dependency of a trait on the genome and environment.
本文探讨了基因型-表型关系。它概述了在哪些情况下,基于对有限数量的基因型的测量,定量性状对基因组的依赖性可能是可预测的。它系统地使用实值布尔函数理论将性状数据转换到傅里叶域。重要的性状特征,如性状景观的粗糙度或性状的模块性,在傅里叶域中有简单的解释。基因位置的崎岖对应于对突变的高敏感性,而基因活性的模块化组织降低了这种敏感性。在崎岖基因位置稀少的性状自然会在傅里叶域中压缩基因数据,导致性状数据的稀疏表示,集中在可识别的低水平系数中。性状的这种傅里叶表示以与性状数据等距的形式组织上位性。由于傅里叶矩阵与标准基之间已知是最大不相干的,因此可以使用压缩感知技术从数据集工作,这些数据集相对于可能的基因组的指数大的集合相对较小-有时甚至是多项式大小。该理论为系统地使用布尔函数机制来剖析性状对基因组和环境的依赖性提供了理论基础。