文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

一种用于选择基因作为表达谱生物标志物的 GMM-IG 框架。

A GMM-IG framework for selecting genes as expression panel biomarkers.

机构信息

School of Informatics, Indiana University, 535 W. Michigan Street, Indianapolis, IN 46202, USA.

出版信息

Artif Intell Med. 2010 Feb-Mar;48(2-3):75-82. doi: 10.1016/j.artmed.2009.07.006. Epub 2009 Dec 8.


DOI:10.1016/j.artmed.2009.07.006
PMID:20004087
Abstract

OBJECTIVE: The limitation of small sample size of functional genomics experiments has made it necessary to integrate DNA microarray experimental data from different sources. However, experimentation noises and biases of different microarray platforms have made integrated data analysis challenging. In this work, we propose an integrative computational framework to identify candidate biomarker genes from publicly available functional genomics studies. METHODS: We developed a new framework, Gaussian Mixture Modeling-Coupled Information Gain (GMM-IG). In this framework, we first apply a two-component Gaussian mixture model (GMM) to estimate the conditional probability distributions of gene expression data between two different types of samples, for example, normal versus cancer. An expectation-maximization algorithm is then used to estimate the maximum likelihood parameters of a mixture of two Gaussian models in the feature space and determine the underlying expression levels of genes. Gene expression results from different studies are discretized, based on GMM estimations and then unified. Significantly differentially-expressed genes are filtered and assessed with information gain (IG) measures. RESULTS: DNA microarray experimental data for lung cancers from three different prior studies was processed using the new GMM-IG method. Target gene markers from a gene expression panel were selected and compared with several conventional computational biomarker data analysis methods. GMM-IG showed consistently high accuracy for several classification assessments. A high reproducibility of gene selection results was also determined from statistical validations. Our study shows that the GMM-IG framework can overcome poor reliability issues from single-study DNA microarray experiment while maintaining high accuracies by combining true signals from multiple studies. CONCLUSIONS: We present a conceptually simple framework that enables reliable integration of true differential gene expression signals from multiple microarray experiments. This novel computational method has been shown to generate interesting biomarker panels for lung cancer studies. It is promising as a general strategy for future panel biomarker development, especially for applications that requires integrating experimental results generated from different research centers or with different technology platforms.

摘要

目的:功能基因组学实验的样本量小的局限性使得有必要整合来自不同来源的 DNA 微阵列实验数据。然而,不同微阵列平台的实验噪声和偏差使得集成数据分析具有挑战性。在这项工作中,我们提出了一种综合计算框架,从公开的功能基因组学研究中识别候选生物标志物基因。

方法:我们开发了一种新的框架,即高斯混合模型-耦合信息增益(GMM-IG)。在该框架中,我们首先应用双成分高斯混合模型(GMM)来估计两种不同类型样本(例如,正常与癌症)之间的基因表达数据的条件概率分布。然后使用期望最大化算法在特征空间中估计两个高斯模型混合的最大似然参数,并确定基因的潜在表达水平。基于 GMM 估计,对来自不同研究的基因表达结果进行离散化,然后统一。使用信息增益(IG)度量筛选和评估差异表达基因。

结果:使用新的 GMM-IG 方法处理来自三个先前研究的肺癌 DNA 微阵列实验数据。选择基因表达面板中的靶基因标记,并与几种常规计算生物标志物数据分析方法进行比较。GMM-IG 在几个分类评估中表现出一致的高精度。从统计验证中还确定了基因选择结果的高度可重复性。我们的研究表明,GMM-IG 框架可以克服单个研究 DNA 微阵列实验的可靠性问题,同时通过结合来自多个研究的真实信号来保持高精度。

结论:我们提出了一个概念上简单的框架,该框架能够可靠地整合来自多个微阵列实验的真实差异基因表达信号。这种新的计算方法已被证明可用于肺癌研究产生有趣的生物标志物面板。它有望成为未来面板生物标志物开发的一般策略,特别是在需要整合来自不同研究中心或使用不同技术平台的实验结果的应用中。

相似文献

[1]
A GMM-IG framework for selecting genes as expression panel biomarkers.

Artif Intell Med. 2009-12-8

[2]
Mixture classification model based on clinical markers for breast cancer prognosis.

Artif Intell Med. 2009-12-14

[3]
Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps.

Artif Intell Med. 2009-12-4

[4]
Ensemble gene selection by grouping for microarray data classification.

J Biomed Inform. 2009-8-20

[5]
Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data.

Bioinformatics. 2005-10-15

[6]
A mixture model with random-effects components for clustering correlated gene-expression profiles.

Bioinformatics. 2006-7-15

[7]
Tumor classification ranking from microarray data.

BMC Genomics. 2008-9-16

[8]
A statistical method for estimating the proportion of differentially expressed genes.

Comput Biol Chem. 2006-6

[9]
Selecting a minimal number of relevant genes from microarray data to design accurate tissue classifiers.

Biosystems. 2007

[10]
Weighted lasso in graphical Gaussian modeling for large gene network estimation based on microarray data.

Genome Inform. 2007

引用本文的文献

[1]
Artificial Intelligence in Point-of-Care Biosensing: Challenges and Opportunities.

Diagnostics (Basel). 2024-5-25

[2]
The g3mclass is a practical software for multiclass classification on biomarkers.

Sci Rep. 2022-11-5

[3]
On integrating multi-experiment microarray data.

Philos Trans A Math Phys Eng Sci. 2014-4-21

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索