基于网络的抗噪方法挖掘乳腺癌基因

Mining breast cancer genes with a network based noise-tolerant approach.

作者信息

Nie Yaling, Yu Jingkai

机构信息

National Key Laboratory of Biochemical Engineering, Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China.

出版信息

BMC Syst Biol. 2013 Jun 25;7:49. doi: 10.1186/1752-0509-7-49.

DOI:10.1186/1752-0509-7-49

PMID:23799982

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3702465/

Abstract

BACKGROUND

Mining novel breast cancer genes is an important task in breast cancer research. Many approaches prioritize candidate genes based on their similarity to known cancer genes, usually by integrating multiple data sources. However, different types of data often contain varying degrees of noise. For effective data integration, it's important to design methods that work robustly with respect to noise.

RESULTS

Gene Ontology (GO) annotations were often utilized in cancer gene mining works. However, the vast majority of GO annotations were computationally derived, thus not completely accurate. A set of genes annotated with breast cancer enriched GO terms was adopted here as a set of source data with realistic noise. A novel noise tolerant approach was proposed to rank candidate breast cancer genes using noisy source data within the framework of a comprehensive human Protein-Protein Interaction (PPI) network. Performance of the proposed method was quantitatively evaluated by comparing it with the more established random walk approach. Results showed that the proposed method exhibited better performance in ranking known breast cancer genes and higher robustness against data noise than the random walk approach. When noise started to increase, the proposed method was able to maintained relatively stable performance, while the random walk approach showed drastic performance decline; when noise increased to a large extent, the proposed method was still able to achieve better performance than random walk did.

CONCLUSIONS

A novel noise tolerant method was proposed to mine breast cancer genes. Compared to the well established random walk approach, it showed better performance in correctly ranking cancer genes and worked robustly with respect to noise within source data. To the best of our knowledge, it's the first such effort to quantitatively analyze noise tolerance between different breast cancer gene mining methods. The sorted gene list can be valuable for breast cancer research. The proposed quantitative noise analysis method may also prove useful for other data integration efforts. It is hoped that the current work can lead to more discussions about influence of data noise on different computational methods for mining disease genes.

摘要

背景

挖掘新型乳腺癌基因是乳腺癌研究中的一项重要任务。许多方法通常通过整合多个数据源，根据候选基因与已知癌症基因的相似性对其进行优先级排序。然而，不同类型的数据往往包含不同程度的噪声。为了实现有效的数据整合，设计对噪声具有鲁棒性的方法非常重要。

结果

基因本体论（GO）注释常用于癌症基因挖掘工作。然而，绝大多数GO注释是通过计算得出的，因此并不完全准确。这里采用一组用乳腺癌富集的GO术语注释的基因作为具有实际噪声的源数据集。提出了一种新的抗噪声方法，在综合人类蛋白质-蛋白质相互作用（PPI）网络框架内，使用有噪声的源数据对候选乳腺癌基因进行排名。通过与更成熟的随机游走方法进行比较，对所提出方法的性能进行了定量评估。结果表明，与随机游走方法相比，该方法在对已知乳腺癌基因进行排名时表现出更好的性能，并且对数据噪声具有更高的鲁棒性。当噪声开始增加时，该方法能够保持相对稳定的性能，而随机游走方法的性能则急剧下降；当噪声大幅增加时，该方法仍能比随机游走方法取得更好的性能。

结论

提出了一种新的抗噪声方法来挖掘乳腺癌基因。与成熟的随机游走方法相比，它在正确排名癌症基因方面表现出更好的性能，并且对源数据中的噪声具有鲁棒性。据我们所知，这是首次对不同乳腺癌基因挖掘方法之间的抗噪声能力进行定量分析。排序后的基因列表对乳腺癌研究可能具有重要价值。所提出的定量噪声分析方法可能对其他数据整合工作也有用。希望当前的工作能够引发更多关于数据噪声对挖掘疾病基因的不同计算方法影响的讨论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cf4/3702465/95b62dd9e881/1752-0509-7-49-1.jpg

相似文献

Mining breast cancer genes with a network based noise-tolerant approach.基于网络的抗噪方法挖掘乳腺癌基因

BMC Syst Biol. 2013 Jun 25;7:49. doi: 10.1186/1752-0509-7-49.

Identification of functionally related genes using data mining and data integration: a breast cancer case study.利用数据挖掘和数据集成识别功能相关基因：乳腺癌案例研究。

BMC Bioinformatics. 2009 Oct 15;10 Suppl 12(Suppl 12):S8. doi: 10.1186/1471-2105-10-S12-S8.

HGPEC: a Cytoscape app for prediction of novel disease-gene and disease-disease associations and evidence collection based on a random walk on heterogeneous network.HGPEC：一款用于基于异质网络上的随机游走预测新型疾病-基因和疾病-疾病关联以及证据收集的Cytoscape应用程序。

BMC Syst Biol. 2017 Jun 15;11(1):61. doi: 10.1186/s12918-017-0437-x.

Overview of the BioCreative III Workshop.第三届生物创意研讨会概述。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.

Prioritization of potential candidate disease genes by topological similarity of protein-protein interaction network and phenotype data.通过蛋白质-蛋白质相互作用网络和表型数据的拓扑相似性对潜在候选疾病基因进行优先级排序。

J Biomed Inform. 2015 Feb;53:229-36. doi: 10.1016/j.jbi.2014.11.004. Epub 2014 Nov 15.

Discovery of error-tolerant biclusters from noisy gene expression data.从嘈杂的基因表达数据中发现容错双聚类。

BMC Bioinformatics. 2011 Nov 24;12 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-12-S12-S1.

Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach.通过结合基因本体和共功能网络改进语义相似性测量：一种基于随机游走的方法。

BMC Syst Biol. 2018 Mar 19;12(Suppl 2):18. doi: 10.1186/s12918-018-0539-0.

bc-GenExMiner 3.0: new mining module computes breast cancer gene expression correlation analyses.bc-GenExMiner 3.0：新的挖掘模块可计算乳腺癌基因表达相关性分析。

Database (Oxford). 2013 Jan 15;2013:bas060. doi: 10.1093/database/bas060. Print 2013.

Extracting Cross-Ontology Weighted Association Rules from Gene Ontology Annotations.从基因本体注释中提取跨本体加权关联规则

IEEE/ACM Trans Comput Biol Bioinform. 2016 Mar-Apr;13(2):197-208. doi: 10.1109/TCBB.2015.2462348.

Network-based method for mining novel HPV infection related genes using random walk with restart algorithm.基于网络的随机游走重启动算法挖掘新型 HPV 感染相关基因的方法。

Biochim Biophys Acta Mol Basis Dis. 2018 Jun;1864(6 Pt B):2376-2383. doi: 10.1016/j.bbadis.2017.11.021. Epub 2017 Nov 29.

引用本文的文献

Big Data to Knowledge: Application of Machine Learning to Predictive Modeling of Therapeutic Response in Cancer.从大数据到知识：机器学习在癌症治疗反应预测建模中的应用

Curr Genomics. 2021 Dec 16;22(4):244-266. doi: 10.2174/1389202921999201224110101.

DNLC: differential network local consistency analysis.DNLC：差异网络局部一致性分析。

BMC Bioinformatics. 2019 Dec 24;20(Suppl 15):489. doi: 10.1186/s12859-019-3046-4.

Systemic tracking of diagnostic function modules for post-menopausal osteoporosis in a differential co-expression network view.基于差异共表达网络视角对绝经后骨质疏松症诊断功能模块的系统追踪

Exp Ther Med. 2018 Mar;15(3):2961-2967. doi: 10.3892/etm.2018.5787. Epub 2018 Jan 23.

Detecting subnetwork-level dynamic correlations.检测子网级动态相关性。

Bioinformatics. 2017 Jan 15;33(2):256-265. doi: 10.1093/bioinformatics/btw616. Epub 2016 Sep 25.

Correcting for the study bias associated with protein-protein interaction measurements reveals differences between protein degree distributions from different cancer types.校正与蛋白质-蛋白质相互作用测量相关的研究偏差后，不同癌症类型的蛋白质度分布之间存在差异。

Front Genet. 2015 Aug 4;6:260. doi: 10.3389/fgene.2015.00260. eCollection 2015.

Identifying the gene signatures from gene-pathway bipartite network guarantees the robust model performance on predicting the cancer prognosis.从基因-通路二分网络中识别基因特征可确保预测癌症预后模型的稳健性能。

Biomed Res Int. 2014;2014:424509. doi: 10.1155/2014/424509. Epub 2014 Jul 14.

EgoNet: identification of human disease ego-network modules.自我网络：人类疾病自我网络模块的识别

BMC Genomics. 2014 Apr 28;15:314. doi: 10.1186/1471-2164-15-314.

本文引用的文献

Cancer statistics, 2012.癌症统计数据，2012 年。

CA Cancer J Clin. 2012 Jan-Feb;62(1):10-29. doi: 10.3322/caac.20138. Epub 2012 Jan 4.

Assigning confidence scores to protein-protein interactions.为蛋白质-蛋白质相互作用分配置信度分数。

Methods Mol Biol. 2012;812:161-74. doi: 10.1007/978-1-61779-455-1_9.

Inferring causal genomic alterations in breast cancer using gene expression data.利用基因表达数据推断乳腺癌中的因果基因组改变。

BMC Syst Biol. 2011 Aug 1;5:121. doi: 10.1186/1752-0509-5-121.

Network-based methods for human disease gene prediction.基于网络的人类疾病基因预测方法。

Brief Funct Genomics. 2011 Sep;10(5):280-93. doi: 10.1093/bfgp/elr024. Epub 2011 Jul 15.

APCluster: an R package for affinity propagation clustering.APCluster：一个用于亲和传播聚类的 R 包。

Bioinformatics. 2011 Sep 1;27(17):2463-4. doi: 10.1093/bioinformatics/btr406. Epub 2011 Jul 6.

In silico gene prioritization by integrating multiple data sources.通过整合多种数据源进行计算基因优先级。

PLoS One. 2011;6(6):e21137. doi: 10.1371/journal.pone.0021137. Epub 2011 Jun 24.

Integrative computational biology for cancer research.癌症研究的综合计算生物学。

Hum Genet. 2011 Oct;130(4):465-81. doi: 10.1007/s00439-011-0983-z. Epub 2011 Apr 22.

Network medicine: a network-based approach to human disease.网络医学：一种基于网络的人类疾病研究方法。

Nat Rev Genet. 2011 Jan;12(1):56-68. doi: 10.1038/nrg2918.

The BioGRID Interaction Database: 2011 update.生物网格相互作用数据库：2011年更新版

Nucleic Acids Res. 2011 Jan;39(Database issue):D698-704. doi: 10.1093/nar/gkq1116. Epub 2010 Nov 11.

DroID 2011: a comprehensive, integrated resource for protein, transcription factor, RNA and gene interactions for Drosophila.DroID 2011：果蝇蛋白质、转录因子、RNA和基因相互作用的综合集成资源。

Nucleic Acids Res. 2011 Jan;39(Database issue):D736-43. doi: 10.1093/nar/gkq1092. Epub 2010 Oct 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于网络的抗噪方法挖掘乳腺癌基因

Mining breast cancer genes with a network based noise-tolerant approach.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献