• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于基因本体(GO)相似性度量的微小RNA表达数据缺失值插补

Missing value imputation for microRNA expression data by using a GO-based similarity measure.

作者信息

Yang Yang, Xu Zhuangdi, Song Dandan

机构信息

Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dongchuan Rd., Shanghai, 200240, China.

Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, 200240, China.

出版信息

BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):10. doi: 10.1186/s12859-015-0853-0.

DOI:10.1186/s12859-015-0853-0
PMID:26818962
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4895707/
Abstract

BACKGROUND

Missing values are commonly present in microarray data profiles. Instead of discarding genes or samples with incomplete expression level, missing values need to be properly imputed for accurate data analysis. The imputation methods can be roughly categorized as expression level-based and domain knowledge-based. The first type of methods only rely on expression data without the help of external data sources, while the second type incorporates available domain knowledge into expression data to improve imputation accuracy. In recent years, microRNA (miRNA) microarray has been largely developed and used for identifying miRNA biomarkers in complex human disease studies. Similar to mRNA profiles, miRNA expression profiles with missing values can be treated with the existing imputation methods. However, the domain knowledge-based methods are hard to be applied due to the lack of direct functional annotation for miRNAs. With the rapid accumulation of miRNA microarray data, it is increasingly needed to develop domain knowledge-based imputation algorithms specific to miRNA expression profiles to improve the quality of miRNA data analysis.

RESULTS

We connect miRNAs with domain knowledge of Gene Ontology (GO) via their target genes, and define miRNA functional similarity based on the semantic similarity of GO terms in GO graphs. A new measure combining miRNA functional similarity and expression similarity is used in the imputation of missing values. The new measure is tested on two miRNA microarray datasets from breast cancer research and achieves improved performance compared with the expression-based method on both datasets.

CONCLUSIONS

The experimental results demonstrate that the biological domain knowledge can benefit the estimation of missing values in miRNA profiles as well as mRNA profiles. Especially, functional similarity defined by GO terms annotated for the target genes of miRNAs can be useful complementary information for the expression-based method to improve the imputation accuracy of miRNA array data. Our method and data are available to the public upon request.

摘要

背景

缺失值在微阵列数据概况中普遍存在。为了进行准确的数据分析,不应丢弃表达水平不完整的基因或样本,而需要对缺失值进行适当的插补。插补方法大致可分为基于表达水平的方法和基于领域知识的方法。第一种方法仅依赖表达数据,无需外部数据源的帮助,而第二种方法将可用的领域知识纳入表达数据以提高插补准确性。近年来,微小RNA(miRNA)微阵列得到了很大发展,并用于在复杂人类疾病研究中鉴定miRNA生物标志物。与mRNA概况类似,具有缺失值的miRNA表达概况可以用现有的插补方法处理。然而,由于缺乏对miRNA的直接功能注释,基于领域知识的方法难以应用。随着miRNA微阵列数据的快速积累,越来越需要开发特定于miRNA表达概况的基于领域知识的插补算法,以提高miRNA数据分析的质量。

结果

我们通过miRNA的靶基因将其与基因本体论(GO)的领域知识联系起来,并基于GO图中GO术语的语义相似性定义miRNA功能相似性。一种结合miRNA功能相似性和表达相似性的新度量用于缺失值的插补。该新度量在来自乳腺癌研究的两个miRNA微阵列数据集上进行了测试,与基于表达的方法相比,在两个数据集上均取得了更好的性能。

结论

实验结果表明,生物领域知识有助于估计miRNA概况以及mRNA概况中的缺失值。特别是,由为miRNA靶基因注释的GO术语定义的功能相似性可以作为基于表达的方法的有用补充信息,以提高miRNA阵列数据的插补准确性。我们的方法和数据可根据要求向公众提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/512c/4895707/2b1069c407cf/12859_2015_853_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/512c/4895707/2a902a479394/12859_2015_853_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/512c/4895707/885d7ee58ed8/12859_2015_853_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/512c/4895707/8d7aaf7a84f3/12859_2015_853_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/512c/4895707/2b1069c407cf/12859_2015_853_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/512c/4895707/2a902a479394/12859_2015_853_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/512c/4895707/885d7ee58ed8/12859_2015_853_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/512c/4895707/8d7aaf7a84f3/12859_2015_853_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/512c/4895707/2b1069c407cf/12859_2015_853_Fig4_HTML.jpg

相似文献

1
Missing value imputation for microRNA expression data by using a GO-based similarity measure.基于基因本体(GO)相似性度量的微小RNA表达数据缺失值插补
BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):10. doi: 10.1186/s12859-015-0853-0.
2
Grouping miRNAs of similar functions via weighted information content of gene ontology.通过基因本体论的加权信息含量对功能相似的微小RNA进行分组。
BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):507. doi: 10.1186/s12859-016-1367-0.
3
A hybrid imputation approach for microarray missing value estimation.一种用于微阵列缺失值估计的混合插补方法。
BMC Genomics. 2015;16 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2164-16-S9-S1. Epub 2015 Aug 17.
4
Missing value imputation for microarray data: a comprehensive comparison study and a web tool.微阵列数据的缺失值插补:一项综合比较研究及网络工具
BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S12. doi: 10.1186/1752-0509-7-S6-S12. Epub 2013 Dec 13.
5
A global learning with local preservation method for microarray data imputation.一种用于微阵列数据插补的全局学习与局部保留方法。
Comput Biol Med. 2016 Oct 1;77:76-89. doi: 10.1016/j.compbiomed.2016.08.005. Epub 2016 Aug 5.
6
The influence of missing value imputation on detection of differentially expressed genes from microarray data.缺失值插补对从微阵列数据中检测差异表达基因的影响。
Bioinformatics. 2005 Dec 1;21(23):4272-9. doi: 10.1093/bioinformatics/bti708. Epub 2005 Oct 10.
7
DNA microarray data imputation and significance analysis of differential expression.DNA微阵列数据插补与差异表达的显著性分析
Bioinformatics. 2005 Nov 15;21(22):4155-61. doi: 10.1093/bioinformatics/bti638. Epub 2005 Aug 23.
8
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.并行缺失值插补:一种用于微阵列数据的新型稳健缺失值估计算法。
Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24.
9
Two-pass imputation algorithm for missing value estimation in gene expression time series.用于基因表达时间序列中缺失值估计的双程插补算法。
J Bioinform Comput Biol. 2007 Oct;5(5):1005-22. doi: 10.1142/s0219720007003053.
10
Towards clustering of incomplete microarray data without the use of imputation.迈向无需插补的不完整微阵列数据聚类
Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31.

引用本文的文献

1
Data Synthesis Reinvented: Preserving Missing Patterns for Enhanced Analysis.数据合成的重塑:保留缺失模式以增强分析。
IEEE Trans Knowl Data Eng. 2025 Jul;37(7):3962-3975. doi: 10.1109/tkde.2025.3563319. Epub 2025 Apr 22.
2
Evaluating Genetic Regulators of MicroRNAs Using Machine Learning Models.使用机器学习模型评估微小RNA的基因调控因子
Int J Mol Sci. 2025 Jun 16;26(12):5757. doi: 10.3390/ijms26125757.
3
Preserving Missing Data Distribution in Synthetic Data.在合成数据中保留缺失数据分布

本文引用的文献

1
Towards integrative gene functional similarity measurement.迈向综合的基因功能相似性度量。
BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-15-S2-S5. Epub 2014 Jan 24.
2
miR148b is a major coordinator of breast cancer progression in a relapse-associated microRNA signature by targeting ITGA5, ROCK1, PIK3CA, NRAS, and CSF1.miR148b 通过靶向 ITGA5、ROCK1、PIK3CA、NRAS 和 CSF1,是复发相关 microRNA 特征中协调乳腺癌进展的主要调控因子。
FASEB J. 2013 Mar;27(3):1223-35. doi: 10.1096/fj.12-214692. Epub 2012 Dec 11.
3
miR-10b*, a master inhibitor of the cell cycle, is down-regulated in human breast tumours.
Proc Int World Wide Web Conf. 2023 Apr-May;2023:2110-2121. doi: 10.1145/3543507.3583297. Epub 2023 Apr 30.
4
An efficient ensemble method for missing value imputation in microarray gene expression data.一种用于微阵列基因表达数据中缺失值插补的有效集成方法。
BMC Bioinformatics. 2021 Apr 13;22(1):188. doi: 10.1186/s12859-021-04109-4.
5
Imputation of Gene Expression Data in Blood Cancer and Its Significance in Inferring Biological Pathways.血液癌症中基因表达数据的插补及其在推断生物学途径中的意义。
Front Oncol. 2020 Jan 8;9:1442. doi: 10.3389/fonc.2019.01442. eCollection 2019.
6
More Agility to Semantic Similarities Algorithm Implementations.更灵活的语义相似性算法实现。
Int J Environ Res Public Health. 2019 Dec 30;17(1):267. doi: 10.3390/ijerph17010267.
7
Bayesian multilevel model of micro RNA levels in ovarian-cancer and healthy subjects.卵巢癌和健康受试者中 microRNA 水平的贝叶斯多层次模型。
PLoS One. 2019 Aug 29;14(8):e0221764. doi: 10.1371/journal.pone.0221764. eCollection 2019.
8
Classifying Incomplete Gene-Expression Data: Ensemble Learning with Non-Pre-Imputation Feature Filtering and Best-First Search Technique.基于非预填补特征过滤和最佳优先搜索技术的集成学习在不完全基因表达数据分类中的应用
Int J Mol Sci. 2018 Oct 30;19(11):3398. doi: 10.3390/ijms19113398.
9
Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach.通过结合基因本体和共功能网络改进语义相似性测量:一种基于随机游走的方法。
BMC Syst Biol. 2018 Mar 19;12(Suppl 2):18. doi: 10.1186/s12918-018-0539-0.
10
Genomic Approaches to Posttraumatic Stress Disorder: The Psychiatric Genomic Consortium Initiative.创伤后应激障碍的基因组学方法:精神疾病基因组学联盟计划。
Biol Psychiatry. 2018 May 15;83(10):831-839. doi: 10.1016/j.biopsych.2018.01.020. Epub 2018 Feb 2.
miR-10b*,细胞周期的主要抑制剂,在人类乳腺癌肿瘤中下调。
EMBO Mol Med. 2012 Nov;4(11):1214-29. doi: 10.1002/emmm.201201483.
4
Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases.基于 miRNA 相关疾病推断人类 microRNA 功能相似性和功能网络。
Bioinformatics. 2010 Jul 1;26(13):1644-50. doi: 10.1093/bioinformatics/btq241. Epub 2010 May 3.
5
GOSemSim: an R package for measuring semantic similarity among GO terms and gene products.GO 语义相似度分析:用于测量 GO 术语和基因产物之间语义相似性的 R 包。
Bioinformatics. 2010 Apr 1;26(7):976-8. doi: 10.1093/bioinformatics/btq064. Epub 2010 Feb 23.
6
Missing value imputation for microarray gene expression data using histone acetylation information.利用组蛋白乙酰化信息对微阵列基因表达数据进行缺失值插补
BMC Bioinformatics. 2008 May 29;9:252. doi: 10.1186/1471-2105-9-252.
7
A new method to measure the semantic similarity of GO terms.一种测量基因本体术语语义相似性的新方法。
Bioinformatics. 2007 May 15;23(10):1274-81. doi: 10.1093/bioinformatics/btm087. Epub 2007 Mar 7.
8
Correlation between gene expression and GO semantic similarity.基因表达与基因本体语义相似性之间的相关性。
IEEE/ACM Trans Comput Biol Bioinform. 2005 Oct-Dec;2(4):330-8. doi: 10.1109/TCBB.2005.50.
9
Gene functional similarity search tool (GFSST).基因功能相似性搜索工具(GFSST)。
BMC Bioinformatics. 2006 Mar 14;7:135. doi: 10.1186/1471-2105-7-135.
10
A microRNA expression signature of human solid tumors defines cancer gene targets.人类实体瘤的微小RNA表达特征可确定癌症基因靶点。
Proc Natl Acad Sci U S A. 2006 Feb 14;103(7):2257-61. doi: 10.1073/pnas.0510565103. Epub 2006 Feb 3.