Department of Statistics, Stanford University, Stanford, CA 94305;
Department of Data Sciences and Operations, Marshall School of Business, University of Southern California, Los Angeles, CA 90089.
Proc Natl Acad Sci U S A. 2020 Sep 29;117(39):24117-24126. doi: 10.1073/pnas.2007743117. Epub 2020 Sep 18.
We introduce a method to draw causal inferences-inferences immune to all possible confounding-from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a conditional independence test that identifies regions of the genome containing distinct causal variants. The proposed digital twin test compares an observed offspring to carefully constructed synthetic offspring from the same parents to determine statistical significance, and it can leverage any black-box multivariate model and additional nontrio genetic data to increase power. Crucially, our inferences are based only on a well-established mathematical model of recombination and make no assumptions about the relationship between the genotypes and phenotypes. We compare our method to the widely used transmission disequilibrium test and demonstrate enhanced power and localization.
我们介绍了一种从包含父母和子女的遗传数据中得出因果推论(不受所有可能混杂因素影响的推论)的方法。这些数据之所以能够得出因果结论,是因为减数分裂中的自然随机性可以被视为一种高维随机实验。我们通过开发一种条件独立性检验来使这一观察结果具有可操作性,该检验可以识别基因组中包含不同因果变体的区域。所提出的数字孪生测试将观察到的后代与来自同一父母的精心构建的合成后代进行比较,以确定统计显著性,并且它可以利用任何黑盒多元模型和其他非三亲遗传数据来提高功效。至关重要的是,我们的推论仅基于重组的成熟数学模型,并且不假设基因型和表型之间的关系。我们将我们的方法与广泛使用的传递不平衡测试进行了比较,并证明了增强的功效和定位能力。