• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于基因表达谱和功能模块,替换不可靠的cDNA微阵列测量值对疾病分类的影响。

Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules.

作者信息

Wang Dong, Lv Yingli, Guo Zheng, Li Xia, Li Yanhui, Zhu Jing, Yang Da, Xu Jianzhen, Wang Chenguang, Rao Shaoqi, Yang Baofeng

机构信息

Department of Bioinformatics and Bio-pharmaceutical Key Laboratory of Heilongjiang Province and State, Harbin Medical University Harbin 150086, China.

出版信息

Bioinformatics. 2006 Dec 1;22(23):2883-9. doi: 10.1093/bioinformatics/btl339. Epub 2006 Jun 29.

DOI:10.1093/bioinformatics/btl339
PMID:16809389
Abstract

MOTIVATION

Microarrays datasets frequently contain a large number of missing values (MVs), which need to be estimated and replaced for subsequent data mining. The focus of the paper is to study the effects of different MV treatments for cDNA microarray data on disease classification analysis.

RESULTS

By analyzing five datasets, we demonstrate that among three kinds of classifiers evaluated in this study, support vector machine (SVM) classifiers are robust to varied MV imputation methods [e.g. replacing MVs by zero, K nearest-neighbor (KNN) imputation algorithm, local least square imputation and Bayesian principal component analysis], while the classification and regression tree classifiers are sensitive in terms of classification accuracy. The KNNclassifiers built on differentially expressed genes (DEGs) are robust to the varied MV treatments, but the performances of the KNN classifiers based on all measured genes can be significantly deteriorated when imputing MVs for genes with larger missing rate (MR) (e.g. MR > 5%). Generally, while replacing MVs by zero performs relatively poor, the other imputation algorithms have little difference in affecting classification performances of the SVM or KNN classifiers. We further demonstrate the power and feasibility of our recently proposed functional expression profile (FEP) approach as means to handle microarray data with MVs. The FEPs, which are derived from the functional modules that are enriched with sets of DEGs and thus can be consistently identified under varied MV treatments, achieve precise disease classification with better biological interpretation. We conclude that the choice of MV treatments should be determined in context of the later approaches used for disease classification. The suggested exclusion criterion of ignoring the genes with larger MR (e.g. >5%), while justifiable for some classifiers such as KNN classifiers, might not be considered as a general rule for all classifiers.

摘要

动机

微阵列数据集经常包含大量缺失值,在后续数据挖掘之前需要对这些缺失值进行估计和替换。本文的重点是研究不同的缺失值处理方法对cDNA微阵列数据疾病分类分析的影响。

结果

通过分析五个数据集,我们证明,在本研究评估的三种分类器中,支持向量机(SVM)分类器对各种缺失值插补方法(例如用零替换缺失值、K近邻(KNN)插补算法、局部最小二乘插补和贝叶斯主成分分析)具有鲁棒性,而分类与回归树分类器在分类准确性方面较为敏感。基于差异表达基因(DEG)构建的KNN分类器对各种缺失值处理方法具有鲁棒性,但是当对缺失率较高(例如缺失率>5%)的基因插补缺失值时,基于所有测量基因的KNN分类器的性能可能会显著下降。一般来说,用零替换缺失值的效果相对较差,而其他插补算法对SVM或KNN分类器分类性能的影响差异不大。我们进一步证明了我们最近提出的功能表达谱(FEP)方法作为处理含有缺失值的微阵列数据手段的有效性和可行性。功能表达谱源自富含差异表达基因集的功能模块,因此在各种缺失值处理方法下都能被一致识别,它能够实现精确的疾病分类,并具有更好的生物学解释。我们得出结论,缺失值处理方法的选择应根据后续用于疾病分类的方法来确定。建议的忽略缺失率较高(例如>5%)基因的排除标准,虽然对某些分类器(如KNN分类器)是合理的,但可能不能被视为所有分类器的通用规则。

相似文献

1
Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules.基于基因表达谱和功能模块,替换不可靠的cDNA微阵列测量值对疾病分类的影响。
Bioinformatics. 2006 Dec 1;22(23):2883-9. doi: 10.1093/bioinformatics/btl339. Epub 2006 Jun 29.
2
Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.基于独立成分分析的惩罚判别方法用于利用基因表达数据进行肿瘤分类
Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18.
3
Multiclass cancer classification and biomarker discovery using GA-based algorithms.使用基于遗传算法的算法进行多类别癌症分类和生物标志物发现。
Bioinformatics. 2005 Jun 1;21(11):2691-7. doi: 10.1093/bioinformatics/bti419. Epub 2005 Apr 6.
4
A combination of rough-based feature selection and RBF neural network for classification using gene expression data.一种基于粗糙集的特征选择与径向基函数神经网络相结合的方法,用于利用基因表达数据进行分类。
IEEE Trans Nanobioscience. 2008 Mar;7(1):91-9. doi: 10.1109/TNB.2008.2000142.
5
Kalman filtering for disease-state estimation from microarray data.用于从微阵列数据估计疾病状态的卡尔曼滤波
Bioinformatics. 2006 Dec 15;22(24):3047-53. doi: 10.1093/bioinformatics/btl545. Epub 2006 Oct 25.
6
Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.使用带贝叶斯正则化的稀疏逻辑回归进行癌症分类中的基因选择。
Bioinformatics. 2006 Oct 1;22(19):2348-55. doi: 10.1093/bioinformatics/btl386. Epub 2006 Jul 14.
7
Structured polychotomous machine diagnosis of multiple cancer types using gene expression.使用基因表达对多种癌症类型进行结构化多分类机器诊断。
Bioinformatics. 2006 Apr 15;22(8):950-8. doi: 10.1093/bioinformatics/btl029. Epub 2006 Feb 1.
8
Ensemble dependence model for classification and prediction of cancer and normal gene expression data.用于癌症和正常基因表达数据分类与预测的集成依赖模型。
Bioinformatics. 2005 Jul 15;21(14):3114-21. doi: 10.1093/bioinformatics/bti483. Epub 2005 May 6.
9
Induction of comprehensible models for gene expression datasets by subgroup discovery methodology.通过子群发现方法为基因表达数据集诱导可理解模型。
J Biomed Inform. 2004 Aug;37(4):269-84. doi: 10.1016/j.jbi.2004.07.007.
10
Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering.微阵列实验缺失值对通过层次聚类的基因组稳定性的影响。
BMC Bioinformatics. 2004 Aug 23;5:114. doi: 10.1186/1471-2105-5-114.

引用本文的文献

1
A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes.一种灵活、可解释且准确的方法,用于推断未测量基因的表达。
Nucleic Acids Res. 2020 Dec 2;48(21):e125. doi: 10.1093/nar/gkaa881.
2
Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning-based neural network.基于迁移学习的神经网络对 RNA 测序缺失数据进行推断。
Gigascience. 2020 Jul 1;9(7). doi: 10.1093/gigascience/giaa076.
3
Classifying Incomplete Gene-Expression Data: Ensemble Learning with Non-Pre-Imputation Feature Filtering and Best-First Search Technique.
基于非预填补特征过滤和最佳优先搜索技术的集成学习在不完全基因表达数据分类中的应用
Int J Mol Sci. 2018 Oct 30;19(11):3398. doi: 10.3390/ijms19113398.
4
An integrative imputation method based on multi-omics datasets.一种基于多组学数据集的综合插补方法。
BMC Bioinformatics. 2016 Jun 21;17:247. doi: 10.1186/s12859-016-1122-6.
5
A hybrid imputation approach for microarray missing value estimation.一种用于微阵列缺失值估计的混合插补方法。
BMC Genomics. 2015;16 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2164-16-S9-S1. Epub 2015 Aug 17.
6
Impact of missing data imputation methods on gene expression clustering and classification.缺失数据插补方法对基因表达聚类和分类的影响。
BMC Bioinformatics. 2015 Feb 26;16:64. doi: 10.1186/s12859-015-0494-3.
7
Missing value imputation for microarray data: a comprehensive comparison study and a web tool.微阵列数据的缺失值插补:一项综合比较研究及网络工具
BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S12. doi: 10.1186/1752-0509-7-S6-S12. Epub 2013 Dec 13.
8
Biological impact of missing-value imputation on downstream analyses of gene expression profiles.缺失值插补对基因表达谱下游分析的生物学影响。
Bioinformatics. 2011 Jan 1;27(1):78-86. doi: 10.1093/bioinformatics/btq613. Epub 2010 Nov 2.
9
A novel tool for classification of epidemiological data of vector-borne diseases.一种用于媒介传播疾病流行病学数据分类的新型工具。
J Glob Infect Dis. 2010 Jan;2(1):35-8. doi: 10.4103/0974-777X.59248.
10
Impact of missing value imputation on classification for DNA microarray gene expression data--a model-based study.缺失值插补对DNA微阵列基因表达数据分类的影响——一项基于模型的研究。
EURASIP J Bioinform Syst Biol. 2009;2009(1):504069. doi: 10.1155/2009/504069. Epub 2010 Mar 2.