使用基于熵的核密度估计识别基因与数量性状之间的关联。

Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation.

作者信息

Yee Jaeyong, Park Taesung, Park Mira

机构信息

Department of Physiology and Biophysics, Eulji University, Daejeon 34824, Korea.

Department of Statistics, Seoul National University, Seoul 08826, Korea.

出版信息

Genomics Inform. 2022 Jun;20(2):e17. doi: 10.5808/gi.22033. Epub 2022 Jun 30.

DOI:10.5808/gi.22033

PMID:35794697

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9299569/

Abstract

Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.

摘要

已经使用多种统计方法对基因关联进行了量化。基于熵的互信息可能是估计关联的更直接方法之一，因为它不依赖于参数化。为此，应该获得表型分布的熵和条件熵。然而，数量性状通常不允许对熵进行精确评估。熵的估计需要一个概率密度函数，它可以通过核密度估计来近似。我们研究了将核密度估计和熵估计与概率密度函数相结合以计算互信息的正确程序顺序。构建基因型及其相互作用以设定条件熵的条件。使用两种不同的核以及两种类型的多因素降维方法和另一种称为m间距的概率密度近似方法，对使用三种类型生成函数创建的大量模拟数据进行了分析。比较了正确检测率方面的统计功效。当性状分布比简单的正态或伽马分布更复杂时，发现使用核最为有用。使用2小时口服葡萄糖耐量试验结果和γ-谷氨酰转肽酶水平作为表型，探索了一个完整规模的基因组数据集以识别关联。发现了与这些表型相关的明显可区分的单核苷酸多态性（SNP）和相互作用的SNP对，并列出了经验p值。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用基于熵的核密度估计识别基因与数量性状之间的关联。

Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

使用基于熵的核密度估计识别基因与数量性状之间的关联。

Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献