• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过定制最近邻算法对基因表达数据进行缺失值插补

Missing value imputation for gene expression data by tailored nearest neighbors.

作者信息

Faisal Shahla, Tutz Gerhard

机构信息

.

出版信息

Stat Appl Genet Mol Biol. 2017 Apr 25;16(2):95-106. doi: 10.1515/sagmb-2015-0098.

DOI:10.1515/sagmb-2015-0098
PMID:28593876
Abstract

High dimensional data like gene expression and RNA-sequences often contain missing values. The subsequent analysis and results based on these incomplete data can suffer strongly from the presence of these missing values. Several approaches to imputation of missing values in gene expression data have been developed but the task is difficult due to the high dimensionality (number of genes) of the data. Here an imputation procedure is proposed that uses weighted nearest neighbors. Instead of using nearest neighbors defined by a distance that includes all genes the distance is computed for genes that are apt to contribute to the accuracy of imputed values. The method aims at avoiding the curse of dimensionality, which typically occurs if local methods as nearest neighbors are applied in high dimensional settings. The proposed weighted nearest neighbors algorithm is compared to existing missing value imputation techniques like mean imputation, KNNimpute and the recently proposed imputation by random forests. We use RNA-sequence and microarray data from studies on human cancer to compare the performance of the methods. The results from simulations as well as real studies show that the weighted distance procedure can successfully handle missing values for high dimensional data structures where the number of predictors is larger than the number of samples. The method typically outperforms the considered competitors.

摘要

像基因表达和RNA序列这样的高维数据常常包含缺失值。基于这些不完整数据的后续分析和结果可能会因这些缺失值的存在而受到严重影响。已经开发了几种用于估算基因表达数据中缺失值的方法,但由于数据的高维度(基因数量),这项任务很困难。本文提出了一种使用加权最近邻的估算程序。不是使用由包含所有基因的距离定义的最近邻,而是针对有助于提高估算值准确性的基因计算距离。该方法旨在避免维度诅咒,维度诅咒通常在高维设置中应用像最近邻这样的局部方法时出现。将提出的加权最近邻算法与现有的缺失值估算技术进行比较,如均值估算、KNNimpute和最近提出的随机森林估算。我们使用来自人类癌症研究的RNA序列和微阵列数据来比较这些方法的性能。模拟以及实际研究的结果表明,加权距离程序能够成功处理预测变量数量大于样本数量的高维数据结构中的缺失值。该方法通常优于所考虑的竞争对手。

相似文献

1
Missing value imputation for gene expression data by tailored nearest neighbors.通过定制最近邻算法对基因表达数据进行缺失值插补
Stat Appl Genet Mol Biol. 2017 Apr 25;16(2):95-106. doi: 10.1515/sagmb-2015-0098.
2
A global learning with local preservation method for microarray data imputation.一种用于微阵列数据插补的全局学习与局部保留方法。
Comput Biol Med. 2016 Oct 1;77:76-89. doi: 10.1016/j.compbiomed.2016.08.005. Epub 2016 Aug 5.
3
Robust imputation method for missing values in microarray data.微阵列数据中缺失值的稳健插补方法。
BMC Bioinformatics. 2007 May 3;8 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-8-S2-S6.
4
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
5
Two-pass imputation algorithm for missing value estimation in gene expression time series.用于基因表达时间序列中缺失值估计的双程插补算法。
J Bioinform Comput Biol. 2007 Oct;5(5):1005-22. doi: 10.1142/s0219720007003053.
6
Missing value imputation in DNA microarrays based on conjugate gradient method.基于共轭梯度法的 DNA 微阵列缺失值插补。
Comput Biol Med. 2012 Feb;42(2):222-7. doi: 10.1016/j.compbiomed.2011.11.011. Epub 2011 Dec 10.
7
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.并行缺失值插补:一种用于微阵列数据的新型稳健缺失值估计算法。
Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24.
8
DNA microarray data imputation and significance analysis of differential expression.DNA微阵列数据插补与差异表达的显著性分析
Bioinformatics. 2005 Nov 15;21(22):4155-61. doi: 10.1093/bioinformatics/bti638. Epub 2005 Aug 23.
9
Missing value estimation for DNA microarray gene expression data: local least squares imputation.DNA微阵列基因表达数据的缺失值估计:局部最小二乘插补法
Bioinformatics. 2005 Jan 15;21(2):187-98. doi: 10.1093/bioinformatics/bth499. Epub 2004 Aug 27.
10
Improving missing value imputation of microarray data by using spot quality weights.利用斑点质量权重改进微阵列数据的缺失值插补
BMC Bioinformatics. 2006 Jun 16;7:306. doi: 10.1186/1471-2105-7-306.

引用本文的文献

1
Exposure-inducible genes may contribute to missingness in RNAseq-based gene expression analyses.暴露诱导基因可能导致基于RNA测序的基因表达分析中出现数据缺失。
Sci Rep. 2025 Aug 22;15(1):30889. doi: 10.1038/s41598-025-14395-0.
2
Association of mitochondrial RNA expression levels in saliva and plasma with interferon signature gene expression and disease activity in patients with Sjögren disease.干燥综合征患者唾液和血浆中线粒体RNA表达水平与干扰素特征基因表达及疾病活动的关联
RMD Open. 2025 May 13;11(2):e005166. doi: 10.1136/rmdopen-2024-005166.
3
Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data.
Procrustes 是一种机器学习方法,可消除临床 RNA 测序数据中的跨平台批次效应。
Commun Biol. 2024 Mar 30;7(1):392. doi: 10.1038/s42003-024-06020-z.
4
Microarray Data Preprocessing: From Experimental Design to Differential Analysis.微阵列数据分析:从实验设计到差异分析。
Methods Mol Biol. 2022;2401:79-100. doi: 10.1007/978-1-0716-1839-4_7.
5
A deep learning approach for predicting severity of COVID-19 patients using a parsimonious set of laboratory markers.一种使用简约实验室指标集预测新冠肺炎患者严重程度的深度学习方法。
iScience. 2021 Dec 17;24(12):103523. doi: 10.1016/j.isci.2021.103523. Epub 2021 Nov 27.
6
Genomic data imputation with variational auto-encoders.基于变分自动编码器的基因组数据插补。
Gigascience. 2020 Aug 1;9(8). doi: 10.1093/gigascience/giaa082.
7
Gene expression biomarkers for kidney transplant rejection-The entire landscape-Author's reply.肾移植排斥反应的基因表达生物标志物——全景——作者回复
EBioMedicine. 2019 Apr;42:42. doi: 10.1016/j.ebiom.2019.03.061. Epub 2019 Mar 28.
8
Network Representation of T-Cell Repertoire- A Novel Tool to Analyze Immune Response to Cancer Formation.T 细胞受体的网络表示——一种分析免疫反应对癌症形成的新工具。
Front Immunol. 2018 Dec 11;9:2913. doi: 10.3389/fimmu.2018.02913. eCollection 2018.