Suppr超能文献

使用基于熵的核密度估计识别基因与数量性状之间的关联。

Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation.

作者信息

Yee Jaeyong, Park Taesung, Park Mira

机构信息

Department of Physiology and Biophysics, Eulji University, Daejeon 34824, Korea.

Department of Statistics, Seoul National University, Seoul 08826, Korea.

出版信息

Genomics Inform. 2022 Jun;20(2):e17. doi: 10.5808/gi.22033. Epub 2022 Jun 30.

Abstract

Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.

摘要

已经使用多种统计方法对基因关联进行了量化。基于熵的互信息可能是估计关联的更直接方法之一,因为它不依赖于参数化。为此,应该获得表型分布的熵和条件熵。然而,数量性状通常不允许对熵进行精确评估。熵的估计需要一个概率密度函数,它可以通过核密度估计来近似。我们研究了将核密度估计和熵估计与概率密度函数相结合以计算互信息的正确程序顺序。构建基因型及其相互作用以设定条件熵的条件。使用两种不同的核以及两种类型的多因素降维方法和另一种称为m间距的概率密度近似方法,对使用三种类型生成函数创建的大量模拟数据进行了分析。比较了正确检测率方面的统计功效。当性状分布比简单的正态或伽马分布更复杂时,发现使用核最为有用。使用2小时口服葡萄糖耐量试验结果和γ-谷氨酰转肽酶水平作为表型,探索了一个完整规模的基因组数据集以识别关联。发现了与这些表型相关的明显可区分的单核苷酸多态性(SNP)和相互作用的SNP对,并列出了经验p值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea4/9299569/d606d6a93737/gi-22033f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验