癌症基因和通路优先级确定中的文本挖掘

Text mining in cancer gene and pathway prioritization.

作者信息

Luo Yuan, Riedlinger Gregory, Szolovits Peter

机构信息

Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.

Department of Pathology, Massachusetts General Hospital, Boston, MA, USA.

出版信息

Cancer Inform. 2014 Oct 13;13(Suppl 1):69-79. doi: 10.4137/CIN.S13874. eCollection 2014.

DOI:10.4137/CIN.S13874

PMID:25392685

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4216063/

Abstract

Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

摘要

癌症相关基因的优先级排序作为一种通过计算分析来降低湿实验室成本的有效方法，受到了越来越多的关注。这种计算分析根据实验验证成功的可能性对候选基因进行排名。众多基因优先级排序工具已经开发出来，每个工具都整合了不同的数据源，包括基因序列、差异表达、功能注释、基因调控、蛋白质结构域、蛋白质相互作用和通路。本综述将现有的基因优先级排序工具置于对癌症的综合组学层次结构观点的背景下，并重点分析其文本挖掘组件。我们解释了文本挖掘在基因优先级排序中进展相对缓慢的原因，识别了当前文本挖掘方法面临的几个挑战，并强调了几个方向，在这些方向上，更有效的文本挖掘算法可能会改善整体优先级排序任务，并且在这些方向上，对通路进行优先级排序可能比仅对基因进行优先级排序更可取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15d9/4216063/4a2d766798dc/cin-suppl.1-2014-069f1.jpg

相似文献

Text mining in cancer gene and pathway prioritization.

Cancer Inform. 2014 Oct 13;13(Suppl 1):69-79. doi: 10.4137/CIN.S13874. eCollection 2014.

Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining.

Bioinformatics. 2008 Aug 15;24(16):i119-25. doi: 10.1093/bioinformatics/btn291.

Gene prioritization and clustering by multi-view text mining.

BMC Bioinformatics. 2010 Jan 14;11:28. doi: 10.1186/1471-2105-11-28.

Computational disease gene prioritization: an appraisal.

J Comput Biol. 2014 Jun;21(6):456-65. doi: 10.1089/cmb.2013.0158. Epub 2014 Mar 25.

A Meta-Analysis Based Method for Prioritizing Candidate Genes Involved in a Pre-specific Function.

Front Plant Sci. 2016 Dec 15;7:1914. doi: 10.3389/fpls.2016.01914. eCollection 2016.

Prioritization of metabolic genes as novel therapeutic targets in estrogen-receptor negative breast tumors using multi-omics data and text mining.

Oncotarget. 2019 Jun 11;10(39):3894-3909. doi: 10.18632/oncotarget.26995.

ProphNet: a generic prioritization method through propagation of information.

BMC Bioinformatics. 2014;15 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2105-15-S1-S5. Epub 2014 Jan 10.

Prioritization of candidate genes for periodontitis using multiple computational tools.

J Periodontol. 2014 Aug;85(8):1059-69. doi: 10.1902/jop.2014.130523. Epub 2014 Jan 30.

Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research.

BMC Bioinformatics. 2015 Feb 21;16:55. doi: 10.1186/s12859-015-0472-9.

Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model.

BMC Bioinformatics. 2016 Nov 10;17(1):453. doi: 10.1186/s12859-016-1317-x.

引用本文的文献

Text mining of verbal autopsy narratives to extract mortality causes and most prevalent diseases using natural language processing.

PLoS One. 2024 Sep 19;19(9):e0308452. doi: 10.1371/journal.pone.0308452. eCollection 2024.

Machine Learning for Lung Cancer Diagnosis, Treatment, and Prognosis.

Genomics Proteomics Bioinformatics. 2022 Oct;20(5):850-866. doi: 10.1016/j.gpb.2022.11.003. Epub 2022 Dec 1.

Deep learning for cancer type classification and driver gene identification.

BMC Bioinformatics. 2021 Oct 25;22(Suppl 4):491. doi: 10.1186/s12859-021-04400-4.

DES-ROD: Exploring Literature to Develop New Links between RNA Oxidation and Human Diseases.

Oxid Med Cell Longev. 2020 Mar 27;2020:5904315. doi: 10.1155/2020/5904315. eCollection 2020.

Cancer classification and pathway discovery using non-negative matrix factorization.

J Biomed Inform. 2019 Aug;96:103247. doi: 10.1016/j.jbi.2019.103247. Epub 2019 Jul 2.

Literature-Based Enrichment Insights into Redox Control of Vascular Biology.

Oxid Med Cell Longev. 2019 May 16;2019:1769437. doi: 10.1155/2019/1769437. eCollection 2019.

Using natural language processing and machine learning to identify breast cancer local recurrence.

BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):498. doi: 10.1186/s12859-018-2466-x.

Natural Language Processing for EHR-Based Computational Phenotyping.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):139-153. doi: 10.1109/TCBB.2018.2849968. Epub 2018 Jun 25.

Contralateral Breast Cancer Event Detection Using Nature Language Processing.

AMIA Annu Symp Proc. 2018 Apr 16;2017:1885-1892. eCollection 2017.

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.

PLoS Comput Biol. 2018 Feb 15;14(2):e1005962. doi: 10.1371/journal.pcbi.1005962. eCollection 2018 Feb.

本文引用的文献

Gene co-expression network analysis reveals common system-level properties of prognostic genes across cancer types.

Nat Commun. 2014;5:3231. doi: 10.1038/ncomms4231.

Automatic lymphoma classification with sentence subgraph mining from pathology reports.

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):824-32. doi: 10.1136/amiajnl-2013-002443. Epub 2014 Jan 15.

The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Nucleic Acids Res. 2014 Jan;42(Database issue):D1001-6. doi: 10.1093/nar/gkt1229. Epub 2013 Dec 6.

ClinVar: public archive of relationships among sequence variation and human phenotype.

Nucleic Acids Res. 2014 Jan;42(Database issue):D980-5. doi: 10.1093/nar/gkt1113. Epub 2013 Nov 14.

The Cancer Genome Atlas Pan-Cancer analysis project.

Nat Genet. 2013 Oct;45(10):1113-20. doi: 10.1038/ng.2764.

Progress and challenges in the computational prediction of gene function using networks.

F1000Res. 2012 Sep 7;1:14. doi: 10.12688/f1000research.1-14.v1. eCollection 2012.

MetaRanker 2.0: a web server for prioritization of genetic variation data.

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W104-8. doi: 10.1093/nar/gkt387. Epub 2013 May 22.

Chapter 15: disease gene prioritization.

PLoS Comput Biol. 2013 Apr;9(4):e1002902. doi: 10.1371/journal.pcbi.1002902. Epub 2013 Apr 25.

Ensembl 2013.

Nucleic Acids Res. 2013 Jan;41(Database issue):D48-55. doi: 10.1093/nar/gks1236. Epub 2012 Nov 30.

BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA.

Nucleic Acids Res. 2013 Jan;41(Database issue):D764-72. doi: 10.1093/nar/gks1049. Epub 2012 Nov 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

癌症基因和通路优先级确定中的文本挖掘

Text mining in cancer gene and pathway prioritization.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献