• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过在统计属性的多维空间中学习判别边界进行稳健的差异表达分析。

Robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes.

作者信息

Bei Yuanzhe, Hong Pengyu

机构信息

Computer Science Department, Brandeis University, Waltham, MA, 02453, USA.

出版信息

BMC Bioinformatics. 2016 Dec 19;17(1):541. doi: 10.1186/s12859-016-1386-x.

DOI:10.1186/s12859-016-1386-x
PMID:27993137
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5168810/
Abstract

BACKGROUND

Performing statistical tests is an important step in analyzing genome-wide datasets for detecting genomic features differentially expressed between conditions. Each type of statistical test has its own advantages in characterizing certain aspects of differences between population means and often assumes a relatively simple data distribution (e.g., Gaussian, Poisson, negative binomial, etc.), which may not be well met by the datasets of interest. Making insufficient distributional assumptions can lead to inferior results when dealing with complex differential expression patterns.

RESULTS

We propose to capture differential expression information more comprehensively by integrating multiple test statistics, each of which has relatively limited capacity to summarize the observed differential expression information. This work addresses a general application scenario, in which users want to detect as many as DEFs while requiring the false discovery rate (FDR) to be lower than a cut-off. We treat each test statistic as a basic attribute, and model the detection of differentially expressed genomic features as learning a discriminant boundary in a multi-dimensional space of basic attributes. We mathematically formulated our goal as a constrained optimization problem aiming to maximize discoveries satisfying a user-defined FDR. An effective algorithm, Discriminant-Cut, has been developed to solve an instantiation of this problem. Extensive comparisons of Discriminant-Cut with 13 existing methods were carried out to demonstrate its robustness and effectiveness.

CONCLUSIONS

We have developed a novel machine learning methodology for robust differential expression analysis, which can be a new avenue to significantly advance research on large-scale differential expression analysis.

摘要

背景

进行统计检验是分析全基因组数据集以检测不同条件下差异表达的基因组特征的重要步骤。每种统计检验在表征总体均值差异的某些方面都有其自身优势,并且通常假定数据分布相对简单(例如,高斯分布、泊松分布、负二项分布等),而感兴趣的数据集可能无法很好地满足这些假定。在处理复杂的差异表达模式时,做出不充分的分布假设可能会导致结果不佳。

结果

我们建议通过整合多个检验统计量来更全面地捕获差异表达信息,每个检验统计量总结观察到的差异表达信息的能力相对有限。这项工作解决了一个一般的应用场景,即用户希望检测尽可能多的差异表达特征(DEFs),同时要求错误发现率(FDR)低于某个临界值。我们将每个检验统计量视为一个基本属性,并将差异表达基因组特征的检测建模为在基本属性的多维空间中学习判别边界。我们将目标数学公式化为一个约束优化问题,旨在最大化满足用户定义的FDR的发现数量。已经开发了一种有效的算法Discriminant-Cut来解决该问题的一个实例。对Discriminant-Cut与13种现有方法进行了广泛比较,以证明其稳健性和有效性。

结论

我们开发了一种用于稳健差异表达分析的新型机器学习方法,这可能是显著推进大规模差异表达分析研究的一条新途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/f5d7649065f4/12859_2016_1386_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/8df8232fc3b2/12859_2016_1386_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/41a28a16dad0/12859_2016_1386_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/cadf70dec761/12859_2016_1386_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/c1351838d467/12859_2016_1386_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/daa22f402a8c/12859_2016_1386_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/bbd99faee01c/12859_2016_1386_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/e7f94a82b784/12859_2016_1386_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/e229570be8bb/12859_2016_1386_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/a2184180883d/12859_2016_1386_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/b1ba517dc7ed/12859_2016_1386_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/0713bd85f279/12859_2016_1386_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/f5d7649065f4/12859_2016_1386_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/8df8232fc3b2/12859_2016_1386_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/41a28a16dad0/12859_2016_1386_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/cadf70dec761/12859_2016_1386_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/c1351838d467/12859_2016_1386_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/daa22f402a8c/12859_2016_1386_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/bbd99faee01c/12859_2016_1386_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/e7f94a82b784/12859_2016_1386_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/e229570be8bb/12859_2016_1386_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/a2184180883d/12859_2016_1386_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/b1ba517dc7ed/12859_2016_1386_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/0713bd85f279/12859_2016_1386_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b71d/5168810/f5d7649065f4/12859_2016_1386_Fig12_HTML.jpg

相似文献

1
Robust differential expression analysis by learning discriminant boundary in multi-dimensional space of statistical attributes.通过在统计属性的多维空间中学习判别边界进行稳健的差异表达分析。
BMC Bioinformatics. 2016 Dec 19;17(1):541. doi: 10.1186/s12859-016-1386-x.
2
MLSeq: Machine learning interface for RNA-sequencing data.MLSeq:用于 RNA-seq 数据的机器学习接口。
Comput Methods Programs Biomed. 2019 Jul;175:223-231. doi: 10.1016/j.cmpb.2019.04.007. Epub 2019 Apr 29.
3
fastJT: An R package for robust and efficient feature selection for machine learning and genome-wide association studies.fastJT:一个用于机器学习和全基因组关联研究的稳健、高效的特征选择的 R 包。
BMC Bioinformatics. 2019 Jun 13;20(1):333. doi: 10.1186/s12859-019-2869-3.
4
Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads.通过纳入非外显子映射读数对RNA测序数据进行差异表达分析。
BMC Genomics. 2015;16 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2164-16-S7-S14. Epub 2015 Jun 11.
5
Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.基于具有异常值处理的判别子空间学习的抗噪声无监督尖峰排序。
J Neural Eng. 2017 Jun;14(3):036003. doi: 10.1088/1741-2552/aa6089. Epub 2017 Feb 15.
6
MIDGET:Detecting differential gene expression on microarray data.MIDGET:检测微阵列数据中的差异基因表达。
Comput Methods Programs Biomed. 2021 Nov;211:106418. doi: 10.1016/j.cmpb.2021.106418. Epub 2021 Sep 16.
7
Improving the Mann-Whitney statistical test for feature selection: an approach in breast cancer diagnosis on mammography.改进用于特征选择的曼-惠特尼统计检验:一种乳腺钼靶摄影乳腺癌诊断方法
Artif Intell Med. 2015 Jan;63(1):19-31. doi: 10.1016/j.artmed.2014.12.004. Epub 2014 Dec 12.
8
Hotelling's T2 multivariate profiling for detecting differential expression in microarrays.用于检测微阵列中差异表达的霍特林T2多元分析
Bioinformatics. 2005 Jul 15;21(14):3105-13. doi: 10.1093/bioinformatics/bti496. Epub 2005 May 19.
9
Signal identification for rare and weak features: higher criticism or false discovery rates?信号识别稀有和微弱特征:高级批评或错误发现率?
Biostatistics. 2013 Jan;14(1):129-43. doi: 10.1093/biostatistics/kxs030. Epub 2012 Sep 6.
10
Improving sensitivity of linear regression-based cell type-specific differential expression deconvolution with per-gene vs. global significance threshold.通过基因特异性与全局显著性阈值提高基于线性回归的细胞类型特异性差异表达反卷积的灵敏度。
BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):334. doi: 10.1186/s12859-016-1226-z.

本文引用的文献

1
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.使用DESeq2对RNA测序数据的倍数变化和离散度进行适度估计。
Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.
2
A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium.测序质量控制联盟对RNA测序准确性、可重复性和信息含量的全面评估。
Nat Biotechnol. 2014 Sep;32(9):903-14. doi: 10.1038/nbt.2957. Epub 2014 Aug 24.
3
Normalization of RNA-seq data using factor analysis of control genes or samples.
使用对照基因或样本的因子分析对RNA测序数据进行标准化。
Nat Biotechnol. 2014 Sep;32(9):896-902. doi: 10.1038/nbt.2931. Epub 2014 Aug 24.
4
voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.voom:精确权重为RNA测序读数计数解锁线性模型分析工具。
Genome Biol. 2014 Feb 3;15(2):R29. doi: 10.1186/gb-2014-15-2-r29.
5
Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline.元分析方法结合多个表达谱:比较、统计特征和应用指南。
BMC Bioinformatics. 2013 Dec 21;14:368. doi: 10.1186/1471-2105-14-368.
6
Comparative DNA methylation among females with neurodevelopmental disorders and seizures identifies TAC1 as a MeCP2 target gene.神经发育障碍和癫痫女性之间的比较 DNA 甲基化确定 TAC1 为 MeCP2 靶基因。
J Neurodev Disord. 2013 Jun 11;5(1):15. doi: 10.1186/1866-1955-5-15.
7
A comparison of methods for differential expression analysis of RNA-seq data.RNA-seq 数据差异表达分析方法的比较。
BMC Bioinformatics. 2013 Mar 9;14:91. doi: 10.1186/1471-2105-14-91.
8
EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments.EBSeq:RNA-seq 实验中用于推理的经验贝叶斯层次模型。
Bioinformatics. 2013 Apr 15;29(8):1035-43. doi: 10.1093/bioinformatics/btt087. Epub 2013 Feb 21.
9
Fully moderated T-statistic for small sample size gene expression arrays.针对小样本量基因表达阵列的完全校正T统计量。
Stat Appl Genet Mol Biol. 2011 Sep 15;10(1):/j/sagmb.2011.10.issue-1/1544-6115.1701/1544-6115.1701.xml. doi: 10.2202/1544-6115.1701.
10
Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors.基于多重收缩先验估计的 RNA 测序数据分析的贝叶斯方法
Biostatistics. 2013 Jan;14(1):113-28. doi: 10.1093/biostatistics/kxs031. Epub 2012 Sep 17.