Suppr超能文献

通过在统计属性的多维空间中学习判别边界进行稳健的差异表达分析。

Robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes.

作者信息

Bei Yuanzhe, Hong Pengyu

机构信息

Computer Science Department, Brandeis University, Waltham, MA, 02453, USA.

出版信息

BMC Bioinformatics. 2016 Dec 19;17(1):541. doi: 10.1186/s12859-016-1386-x.

Abstract

BACKGROUND

Performing statistical tests is an important step in analyzing genome-wide datasets for detecting genomic features differentially expressed between conditions. Each type of statistical test has its own advantages in characterizing certain aspects of differences between population means and often assumes a relatively simple data distribution (e.g., Gaussian, Poisson, negative binomial, etc.), which may not be well met by the datasets of interest. Making insufficient distributional assumptions can lead to inferior results when dealing with complex differential expression patterns.

RESULTS

We propose to capture differential expression information more comprehensively by integrating multiple test statistics, each of which has relatively limited capacity to summarize the observed differential expression information. This work addresses a general application scenario, in which users want to detect as many as DEFs while requiring the false discovery rate (FDR) to be lower than a cut-off. We treat each test statistic as a basic attribute, and model the detection of differentially expressed genomic features as learning a discriminant boundary in a multi-dimensional space of basic attributes. We mathematically formulated our goal as a constrained optimization problem aiming to maximize discoveries satisfying a user-defined FDR. An effective algorithm, Discriminant-Cut, has been developed to solve an instantiation of this problem. Extensive comparisons of Discriminant-Cut with 13 existing methods were carried out to demonstrate its robustness and effectiveness.

CONCLUSIONS

We have developed a novel machine learning methodology for robust differential expression analysis, which can be a new avenue to significantly advance research on large-scale differential expression analysis.

摘要

背景

进行统计检验是分析全基因组数据集以检测不同条件下差异表达的基因组特征的重要步骤。每种统计检验在表征总体均值差异的某些方面都有其自身优势,并且通常假定数据分布相对简单(例如,高斯分布、泊松分布、负二项分布等),而感兴趣的数据集可能无法很好地满足这些假定。在处理复杂的差异表达模式时,做出不充分的分布假设可能会导致结果不佳。

结果

我们建议通过整合多个检验统计量来更全面地捕获差异表达信息,每个检验统计量总结观察到的差异表达信息的能力相对有限。这项工作解决了一个一般的应用场景,即用户希望检测尽可能多的差异表达特征(DEFs),同时要求错误发现率(FDR)低于某个临界值。我们将每个检验统计量视为一个基本属性,并将差异表达基因组特征的检测建模为在基本属性的多维空间中学习判别边界。我们将目标数学公式化为一个约束优化问题,旨在最大化满足用户定义的FDR的发现数量。已经开发了一种有效的算法Discriminant-Cut来解决该问题的一个实例。对Discriminant-Cut与13种现有方法进行了广泛比较,以证明其稳健性和有效性。

结论

我们开发了一种用于稳健差异表达分析的新型机器学习方法,这可能是显著推进大规模差异表达分析研究的一条新途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/8df8232fc3b2/12859_2016_1386_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验