• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于多视图文本挖掘的基因优先级排序和聚类。

Gene prioritization and clustering by multi-view text mining.

机构信息

Bioinformatics Group, Department of Electrical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Heverlee B3001, Belgium.

出版信息

BMC Bioinformatics. 2010 Jan 14;11:28. doi: 10.1186/1471-2105-11-28.

DOI:10.1186/1471-2105-11-28
PMID:20074336
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3098068/
Abstract

BACKGROUND

Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model.

RESULTS

We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods.

CONCLUSIONS

In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification.

摘要

背景

文本挖掘已成为生物学家试图理解疾病遗传学的有用工具。特别是,它可以帮助识别疾病最有趣的候选基因,以便进一步进行实验分析。已经引入了许多文本挖掘方法,但是不同文本挖掘模型中的疾病-基因识别效果有所不同。因此,引入更多文本挖掘模型的想法可能有助于获得更精细和准确的知识。但是,如何有效地结合这些模型仍然是机器学习中的一个具有挑战性的问题。特别是,保证集成模型的性能优于最佳的单个模型是一个非平凡的问题。

结果

我们提出了一种多视图方法,使用不同的受控词汇表来检索生物医学知识。这些受控词汇表是基于九个著名的生物本体选择的,并应用于对 MEDLINE 存储库中大量基于基因的自由文本信息进行索引。词汇表指定的文本挖掘结果被视为一个视图,并且通过多源学习算法集成获得的多个视图。我们在两个基本的计算性疾病基因识别任务(基因优先级和基因聚类)中研究了集成的效果。在这两个任务中,多视图方法的性能均明显优于其他比较方法。

结论

在实际研究中,与任务相关的特定词汇的相关性通常是未知的。在这种情况下,多视图文本挖掘是一种用于基于文本的疾病基因识别的优越且有前途的策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/ed5308f9dca8/1471-2105-11-28-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/4c1b44d21b0d/1471-2105-11-28-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/a82604c57692/1471-2105-11-28-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/8f5d50c15b28/1471-2105-11-28-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/9d008e62dbc1/1471-2105-11-28-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/197a4a9c095c/1471-2105-11-28-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/2bc051be57b1/1471-2105-11-28-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/62a6b3d2b034/1471-2105-11-28-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/8a009cc64b21/1471-2105-11-28-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/ed5308f9dca8/1471-2105-11-28-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/4c1b44d21b0d/1471-2105-11-28-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/a82604c57692/1471-2105-11-28-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/8f5d50c15b28/1471-2105-11-28-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/9d008e62dbc1/1471-2105-11-28-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/197a4a9c095c/1471-2105-11-28-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/2bc051be57b1/1471-2105-11-28-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/62a6b3d2b034/1471-2105-11-28-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/8a009cc64b21/1471-2105-11-28-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00ab/3098068/ed5308f9dca8/1471-2105-11-28-9.jpg

相似文献

1
Gene prioritization and clustering by multi-view text mining.基于多视图文本挖掘的基因优先级排序和聚类。
BMC Bioinformatics. 2010 Jan 14;11:28. doi: 10.1186/1471-2105-11-28.
2
Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining.通过文本挖掘进行基因优先级排序的词汇表、表示法和排序算法比较
Bioinformatics. 2008 Aug 15;24(16):i119-25. doi: 10.1093/bioinformatics/btn291.
3
Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research.从文本和大规模数据分析中提取基因与疾病之间的关系:对转化研究的启示。
BMC Bioinformatics. 2015 Feb 21;16:55. doi: 10.1186/s12859-015-0472-9.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
Knowledge based word-concept model estimation and refinement for biomedical text mining.用于生物医学文本挖掘的基于知识的词概念模型估计与优化。
J Biomed Inform. 2015 Feb;53:300-7. doi: 10.1016/j.jbi.2014.11.015. Epub 2014 Dec 12.
6
Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.文本挖掘有助于数据库管理——从生物医学文献中提取突变与疾病的关联。
BMC Bioinformatics. 2015 Jun 6;16:185. doi: 10.1186/s12859-015-0609-x.
7
Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?文本中的事实:文本挖掘能否助力利用本体对基因产物进行大规模高质量人工编目?
Brief Bioinform. 2008 Nov;9(6):466-78. doi: 10.1093/bib/bbn043. Epub 2008 Dec 6.
8
Biomedical hypothesis generation by text mining and gene prioritization.通过文本挖掘和基因优先级排序生成生物医学假设。
Protein Pept Lett. 2014;21(8):847-57. doi: 10.2174/09298665113209990063.
9
Text mining for traditional Chinese medical knowledge discovery: a survey.基于文本挖掘的中医药知识发现研究综述。
J Biomed Inform. 2010 Aug;43(4):650-60. doi: 10.1016/j.jbi.2010.01.002. Epub 2010 Jan 13.
10
Facilitating the development of controlled vocabularies for metabolomics technologies with text mining.利用文本挖掘促进代谢组学技术受控词汇表的发展。
BMC Bioinformatics. 2008 Apr 29;9 Suppl 5(Suppl 5):S5. doi: 10.1186/1471-2105-9-S5-S5.

引用本文的文献

1
Identification of pivotal genes and pathways in Chorea-acanthocytosis using comprehensive bioinformatic analysis.采用综合生物信息学分析鉴定舞蹈棘红细胞增多症中的关键基因和通路。
PLoS One. 2024 Sep 18;19(9):e0309594. doi: 10.1371/journal.pone.0309594. eCollection 2024.
2
Identification of potential molecular mechanisms and candidate drugs for radiotherapy- and chemotherapy-induced mucositis.放疗和化疗引起的黏膜炎潜在分子机制及候选药物的鉴定
Support Care Cancer. 2023 Mar 20;31(4):223. doi: 10.1007/s00520-023-07686-7.
3
A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning.

本文引用的文献

1
The deacetylase HDAC4 controls myocyte enhancing factor-2-dependent structural gene expression in response to neural activity.脱乙酰酶HDAC4可响应神经活动,调控依赖于肌细胞增强因子2的结构基因表达。
FASEB J. 2009 Jan;23(1):99-106. doi: 10.1096/fj.08-115931. Epub 2008 Sep 9.
2
Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining.通过文本挖掘进行基因优先级排序的词汇表、表示法和排序算法比较
Bioinformatics. 2008 Aug 15;24(16):i119-25. doi: 10.1093/bioinformatics/btn291.
3
Prioritization of candidate disease genes for metabolic syndrome by computational analysis of its defining phenotypes.
一种使用深度图卷积网络和半监督学习的新型候选疾病基因优先级排序方法。
BMC Bioinformatics. 2022 Oct 14;23(1):422. doi: 10.1186/s12859-022-04954-x.
4
Angiogenesis goes computational - The future way forward to discover new angiogenic targets?血管生成进入计算时代——发现新的血管生成靶点的未来之路?
Comput Struct Biotechnol J. 2022 Sep 13;20:5235-5255. doi: 10.1016/j.csbj.2022.09.019. eCollection 2022.
5
Identification of Key Genes and Molecular Pathways in Keratoconus: Integrating Text Mining and Bioinformatics Analysis.角膜膨隆症中的关键基因和分子途径的鉴定:文本挖掘和生物信息学分析的整合。
Biomed Res Int. 2022 Aug 23;2022:4740141. doi: 10.1155/2022/4740141. eCollection 2022.
6
IoMT-Based Mitochondrial and Multifactorial Genetic Inheritance Disorder Prediction Using Machine Learning.基于物联网的线粒体和多因素遗传疾病预测的机器学习方法。
Comput Intell Neurosci. 2022 Jul 21;2022:2650742. doi: 10.1155/2022/2650742. eCollection 2022.
7
Mechanical Strain Regulates Myofibroblast Differentiation of Human Scleral Fibroblasts by YAP.机械应变通过YAP调节人巩膜成纤维细胞的肌成纤维细胞分化。
Front Physiol. 2021 Sep 30;12:712509. doi: 10.3389/fphys.2021.712509. eCollection 2021.
8
Identification of Key Genes and Pathways in Persistent Hyperplastic Primary Vitreous of the Eye Using Bioinformatic Analysis.利用生物信息学分析鉴定眼部永存性原发性玻璃体增生症中的关键基因和通路
Front Med (Lausanne). 2021 Aug 13;8:690594. doi: 10.3389/fmed.2021.690594. eCollection 2021.
9
Identification of key genes and pathways in scleral extracellular matrix remodeling in glaucoma: Potential therapeutic agents discovered using bioinformatics analysis.利用生物信息学分析鉴定青光眼巩膜细胞外基质重塑中的关键基因和通路:潜在的治疗药物。
Int J Med Sci. 2021 Feb 4;18(7):1554-1565. doi: 10.7150/ijms.52846. eCollection 2021.
10
Computational screening of potential glioma-related genes and drugs based on analysis of GEO dataset and text mining.基于 GEO 数据集和文本挖掘的潜在脑胶质瘤相关基因和药物的计算筛选。
PLoS One. 2021 Feb 26;16(2):e0247612. doi: 10.1371/journal.pone.0247612. eCollection 2021.
通过对代谢综合征定义性表型的计算分析对候选疾病基因进行优先级排序。
Physiol Genomics. 2008 Sep 17;35(1):55-64. doi: 10.1152/physiolgenomics.90247.2008. Epub 2008 Jul 8.
4
ENDEAVOUR update: a web resource for gene prioritization in multiple species.奋进更新:一个用于多种物种基因优先级排序的网络资源。
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W377-84. doi: 10.1093/nar/gkn325. Epub 2008 May 28.
5
Mapping proteins to disease terminologies: from UniProt to MeSH.将蛋白质映射到疾病术语:从通用蛋白质数据库(UniProt)到医学主题词表(MeSH)。
BMC Bioinformatics. 2008 Apr 29;9 Suppl 5(Suppl 5):S3. doi: 10.1186/1471-2105-9-S5-S3.
6
Assessment of disease named entity recognition on a corpus of annotated sentences.基于带注释句子语料库的疾病命名实体识别评估。
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.
7
Memory in microbes: quantifying history-dependent behavior in a bacterium.微生物中的记忆:量化细菌中依赖历史的行为
PLoS One. 2008 Feb 27;3(2):e1700. doi: 10.1371/journal.pone.0001700.
8
Mercer kernel-based clustering in feature space.特征空间中基于 Mercer 核的聚类
IEEE Trans Neural Netw. 2002;13(3):780-4. doi: 10.1109/TNN.2002.1000150.
9
Cumulative voting consensus method for partitions with variable number of clusters.具有可变聚类数的分区的累积投票共识方法。
IEEE Trans Pattern Anal Mach Intell. 2008 Jan;30(1):160-73. doi: 10.1109/TPAMI.2007.1138.
10
Multiple approaches to fine-grained indexing of the biomedical literature.生物医学文献细粒度索引的多种方法。
Pac Symp Biocomput. 2007:292-303.