Suppr超能文献

从高通量基因表达数据中选择生物学相关基因的统计方法

Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data.

作者信息

Das Samarendra, Rai Shesh N

机构信息

Division of Statistical Genetics, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.

Netaji Subhas-Indian Council of Agricultural Research (ICAR) International Fellow, Indian Council of Agricultural Research, Krishi Bhawan, New Delhi 110001, India.

出版信息

Entropy (Basel). 2020 Oct 25;22(11):1205. doi: 10.3390/e22111205.

Abstract

Selection of biologically relevant genes from high-dimensional expression data is a key research problem in gene expression genomics. Most of the available gene selection methods are either based on relevancy or redundancy measure, which are usually adjudged through post selection classification accuracy. Through these methods the ranking of genes was conducted on a single high-dimensional expression data, which led to the selection of spuriously associated and redundant genes. Hence, we developed a statistical approach through combining a support vector machine with Maximum Relevance and Minimum Redundancy under a sound statistical setup for the selection of biologically relevant genes. Here, the genes were selected through statistical significance values and computed using a nonparametric test statistic under a bootstrap-based subject sampling model. Further, a systematic and rigorous evaluation of the proposed approach with nine existing competitive methods was carried on six different real crop gene expression datasets. This performance analysis was carried out under three comparison settings, i.e., subject classification, biological relevant criteria based on quantitative trait loci and gene ontology. Our analytical results showed that the proposed approach selects genes which are more biologically relevant as compared to the existing methods. Moreover, the proposed approach was also found to be better with respect to the competitive existing methods. The proposed statistical approach provides a framework for combining filter and wrapper methods of gene selection.

摘要

从高维表达数据中选择具有生物学相关性的基因是基因表达基因组学中的一个关键研究问题。大多数现有的基因选择方法要么基于相关性度量,要么基于冗余性度量,通常通过选择后的分类准确性来判断。通过这些方法,基因排名是在单个高维表达数据上进行的,这导致选择了虚假关联和冗余的基因。因此,我们开发了一种统计方法,在合理的统计设置下,将支持向量机与最大相关性和最小冗余性相结合,用于选择具有生物学相关性的基因。在这里,基因是通过统计显著性值来选择的,并在基于自助法的个体抽样模型下使用非参数检验统计量进行计算。此外,我们在六个不同的真实作物基因表达数据集上,对所提出的方法与九种现有的竞争方法进行了系统而严格的评估。这种性能分析是在三种比较设置下进行的,即个体分类、基于数量性状位点的生物学相关标准和基因本体。我们的分析结果表明,与现有方法相比,所提出的方法选择的基因具有更高的生物学相关性。此外,还发现所提出的方法在与现有的竞争方法相比时表现更好。所提出的统计方法为基因选择的过滤和包装方法的结合提供了一个框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42a5/7712650/7708a85777e2/entropy-22-01205-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验