• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于蜜蜂算法的基因表达数据缺失值插补提高分类性能。

Missing value imputation on gene expression data using bee-based algorithm to improve classification performance.

机构信息

Department of Computer Science, Faculty of Science and Technology, Thammasat University (Rangsit Campus), Pathum Thani, Thailand.

Thammasat University Research Unit in Data Innovation and Artificial Intelligence, Thammasat University (Rangsit Campus), Pathum Thani, Thailand.

出版信息

PLoS One. 2024 Aug 29;19(8):e0305492. doi: 10.1371/journal.pone.0305492. eCollection 2024.

DOI:10.1371/journal.pone.0305492
PMID:39208345
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11361674/
Abstract

Existing missing value imputation methods focused on imputing the data regarding actual values towards a completion of datasets as an input for machine learning tasks. This work proposes an imputation of missing values towards improvement of accuracy performance for classification. The proposed method was based on bee algorithm and the use of k-nearest neighborhood with linear regression to guide on finding the appropriate solution in prevention of randomness. Among the processes, GINI importance score was utilized in selecting values for imputation. The imputed values thus reflected on improving a discriminative power in classification tasks instead of replicating the actual values from the original dataset. In this study, we evaluated the proposed method against frequently used imputation methods such as k-nearest neighborhood, principal components analysis, nonlinear principal, and component analysis to compare root mean square error results and accuracy of using imputed datasets in a classification task. The experimental results indicated that our proposed method obtained the best accuracy results from all datasets comparing to other methods. In comparison to original dataset, the classification model from imputed datasets yielded 15-25% higher accuracy in class prediction. From analysis, the results showed that feature ranking used in a classification process was affected and lead to noticeably change in informativeness as the imputed data from the proposed method played the role to boost a discriminating power.

摘要

现有的缺失值插补方法主要集中在将实际值的数据插补为机器学习任务的输入数据集的完整化。这项工作提出了一种缺失值插补方法,旨在提高分类的准确性性能。所提出的方法基于蜜蜂算法,并使用 k-最近邻和线性回归来指导寻找合适的解决方案,以防止随机性。在这些过程中,基尼重要性得分用于选择插补的值。因此,插补的值反映了在分类任务中提高判别能力,而不是复制原始数据集中的实际值。在这项研究中,我们评估了所提出的方法与常用的插补方法(如 k-最近邻、主成分分析、非线性主成分和组件分析)相比,以比较均方根误差结果和使用插补数据集在分类任务中的准确性。实验结果表明,与其他方法相比,我们提出的方法从所有数据集获得了最佳的准确性结果。与原始数据集相比,插补数据集的分类模型在类预测中产生了 15-25%的更高准确性。从分析结果可以看出,在分类过程中使用的特征排序受到影响,并导致信息量发生明显变化,因为所提出的方法的插补数据起到了提高判别能力的作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e0/11361674/08a181302653/pone.0305492.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e0/11361674/52371879eb62/pone.0305492.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e0/11361674/e5f2d098a708/pone.0305492.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e0/11361674/08a181302653/pone.0305492.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e0/11361674/52371879eb62/pone.0305492.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e0/11361674/e5f2d098a708/pone.0305492.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e0/11361674/08a181302653/pone.0305492.g003.jpg

相似文献

1
Missing value imputation on gene expression data using bee-based algorithm to improve classification performance.基于蜜蜂算法的基因表达数据缺失值插补提高分类性能。
PLoS One. 2024 Aug 29;19(8):e0305492. doi: 10.1371/journal.pone.0305492. eCollection 2024.
2
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
3
Imputation of Gene Expression Data in Blood Cancer and Its Significance in Inferring Biological Pathways.血液癌症中基因表达数据的插补及其在推断生物学途径中的意义。
Front Oncol. 2020 Jan 8;9:1442. doi: 10.3389/fonc.2019.01442. eCollection 2019.
4
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.并行缺失值插补:一种用于微阵列数据的新型稳健缺失值估计算法。
Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24.
5
Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach.利用互信息,采用自适应 k-最近邻方法对静态和动态混合类型临床数据进行插补。
BMC Med Inform Decis Mak. 2020 Aug 20;20(Suppl 5):174. doi: 10.1186/s12911-020-01166-2.
6
Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets.缺失数据插补方法对队列研究数据集预测建模效果的比较。
BMC Med Res Methodol. 2024 Feb 16;24(1):41. doi: 10.1186/s12874-024-02173-x.
7
Missing value imputation for gene expression data by tailored nearest neighbors.通过定制最近邻算法对基因表达数据进行缺失值插补
Stat Appl Genet Mol Biol. 2017 Apr 25;16(2):95-106. doi: 10.1515/sagmb-2015-0098.
8
Impact of missing data imputation methods on gene expression clustering and classification.缺失数据插补方法对基因表达聚类和分类的影响。
BMC Bioinformatics. 2015 Feb 26;16:64. doi: 10.1186/s12859-015-0494-3.
9
Two-pass imputation algorithm for missing value estimation in gene expression time series.用于基因表达时间序列中缺失值估计的双程插补算法。
J Bioinform Comput Biol. 2007 Oct;5(5):1005-22. doi: 10.1142/s0219720007003053.
10
A hybrid imputation approach for microarray missing value estimation.一种用于微阵列缺失值估计的混合插补方法。
BMC Genomics. 2015;16 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2164-16-S9-S1. Epub 2015 Aug 17.

本文引用的文献

1
An efficient ensemble method for missing value imputation in microarray gene expression data.一种用于微阵列基因表达数据中缺失值插补的有效集成方法。
BMC Bioinformatics. 2021 Apr 13;22(1):188. doi: 10.1186/s12859-021-04109-4.
2
SICE: an improved missing data imputation technique.SICE:一种改进的缺失数据插补技术。
J Big Data. 2020;7(1):37. doi: 10.1186/s40537-020-00313-w. Epub 2020 Jun 12.
3
On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning.
关于划分训练集和验证集:交叉验证、自助法和系统抽样在估计监督学习泛化性能方面的比较研究
J Anal Test. 2018;2(3):249-262. doi: 10.1007/s41664-018-0068-2. Epub 2018 Oct 29.
4
CuMiDa: An Extensively Curated Microarray Database for Benchmarking and Testing of Machine Learning Approaches in Cancer Research.CuMiDa:一个经过广泛整理的微阵列数据库,用于癌症研究中机器学习方法的基准测试和验证。
J Comput Biol. 2019 Apr;26(4):376-386. doi: 10.1089/cmb.2018.0238. Epub 2019 Feb 21.
5
Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.基于质谱的代谢组学数据的缺失值插补方法。
Sci Rep. 2018 Jan 12;8(1):663. doi: 10.1038/s41598-017-19120-0.
6
Missing values in big data research: some basic skills.大数据研究中的缺失值:一些基本技巧。
Ann Transl Med. 2015 Dec;3(21):323. doi: 10.3978/j.issn.2305-5839.2015.12.11.
7
Microarray profiling shows distinct differences between primary tumors and commonly used preclinical models in hepatocellular carcinoma.微阵列分析显示,原发性肝癌的原发肿瘤与常用的临床前模型之间存在明显差异。
BMC Cancer. 2015 Oct 31;15:828. doi: 10.1186/s12885-015-1814-8.
8
Impact of missing data imputation methods on gene expression clustering and classification.缺失数据插补方法对基因表达聚类和分类的影响。
BMC Bioinformatics. 2015 Feb 26;16:64. doi: 10.1186/s12859-015-0494-3.
9
Cross-species antibody microarray interrogation identifies a 3-protein panel of plasma biomarkers for early diagnosis of pancreas cancer.跨物种抗体微阵列检测鉴定出一组用于胰腺癌早期诊断的血浆生物标志物(由三种蛋白质组成)。
Clin Cancer Res. 2015 Apr 1;21(7):1764-71. doi: 10.1158/1078-0432.CCR-13-3474. Epub 2015 Jan 14.
10
Sequential local least squares imputation estimating missing value of microarray data.基于序列局部最小二乘法插补估计微阵列数据的缺失值
Comput Biol Med. 2008 Oct;38(10):1112-20. doi: 10.1016/j.compbiomed.2008.08.006. Epub 2008 Sep 30.