• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

挖掘生物医学文献以发现基因与基因之间的关系:算法的比较研究

Text mining biomedical literature for discovering gene-to-gene relationships: a comparative study of algorithms.

作者信息

Liu Ying, Navathe Shamkant B, Civera Jorge, Dasigi Venu, Ram Ashwin, Ciliax Brian J, Dingledine Ray

机构信息

College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA 30322, USA.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2005 Jan-Mar;2(1):62-76. doi: 10.1109/TCBB.2005.14.

DOI:10.1109/TCBB.2005.14
PMID:17044165
Abstract

Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEA-PARTITION in this paper. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations. The results showed that BEA-PARTITION and hierarchical clustering algorithm outperformed k-means clustering and self-organizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEA-PARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEA-PARTITION had higher purity, lower entropy, and higher mutual information than those produced by k-means and self-organizing map. Whereas BEA-PARTITION and the hierarchical clustering produced similar quality of clusters, BEA-PARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEA-PARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations.

摘要

将密切相关的基因划分为不同的簇已成为几乎所有微阵列数据统计分析的重要组成部分。针对此任务已开发出许多计算机算法。尽管这些算法已证明其在基因聚类方面的有用性,但一些基本问题仍然存在。本文描述了我们的工作,即从MEDLINE中提取一组基因的功能关键词,这些基因是基于其差异表达模式从微阵列实验中分离出来以供进一步研究的。基因之间功能关键词的共享被用作一种新方法(本文称为BEA-PARTITION)中聚类的基础。从MEDLINE摘要中提取与基因相关的功能关键词。我们修改了在心理学和数据库设计中被广泛接受但在生物信息学中几乎无人知晓的键能算法(BEA),以通过功能关键词关联对基因进行聚类。结果表明,在一个包含四个已知基因组的测试集中,BEA-PARTITION和层次聚类算法通过正确分配26个基因中的25个,优于k均值聚类和自组织映射。为了评估BEA-PARTITION对通过微阵列图谱鉴定的基因进行聚类的有效性,44个在细胞周期中差异表达且在文献中已被广泛研究的酵母基因被用作第二个测试集。使用既定的聚类质量度量方法,BEA-PARTITION产生的结果比k均值聚类和自组织映射产生的结果具有更高的纯度、更低的熵和更高的互信息。虽然BEA-PARTITION和层次聚类产生的聚类质量相似,但与层次聚类相比,BEA-PARTITION提供了清晰的聚类边界。BEA-PARTITION易于实现,为基因聚类或任何可以从实验观察中获得起始矩阵的聚类问题提供了一种强大的方法。

相似文献

1
Text mining biomedical literature for discovering gene-to-gene relationships: a comparative study of algorithms.挖掘生物医学文献以发现基因与基因之间的关系:算法的比较研究
IEEE/ACM Trans Comput Biol Bioinform. 2005 Jan-Mar;2(1):62-76. doi: 10.1109/TCBB.2005.14.
2
Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering.两种用于从MEDLINE自动提取关键词以进行功能基因聚类的方案比较。
Proc IEEE Comput Syst Bioinform Conf. 2004:394-404. doi: 10.1109/csb.2004.1332452.
3
A multi-level text mining method to extract biological relationships.一种用于提取生物关系的多层次文本挖掘方法。
Proc IEEE Comput Soc Bioinform Conf. 2002;1:97-108.
4
Inferring modules of functionally interacting proteins using the Bond Energy Algorithm.使用键能算法推断功能相互作用蛋白质的模块。
BMC Bioinformatics. 2008 Jun 17;9:285. doi: 10.1186/1471-2105-9-285.
5
Gene Ontology friendly biclustering of expression profiles.基因本体友好型表达谱双聚类分析
Proc IEEE Comput Syst Bioinform Conf. 2004:436-47.
6
Discovering patterns to extract protein-protein interactions from the literature: Part II.从文献中发现用于提取蛋白质-蛋白质相互作用的模式:第二部分。
Bioinformatics. 2005 Aug 1;21(15):3294-300. doi: 10.1093/bioinformatics/bti493. Epub 2005 May 12.
7
Literature mining and database annotation of protein phosphorylation using a rule-based system.使用基于规则的系统对蛋白质磷酸化进行文献挖掘和数据库注释。
Bioinformatics. 2005 Jun 1;21(11):2759-65. doi: 10.1093/bioinformatics/bti390. Epub 2005 Apr 6.
8
Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes.基于共现的科学文本元分析:检索基因间的生物学关系
Bioinformatics. 2005 May 1;21(9):2049-58. doi: 10.1093/bioinformatics/bti268. Epub 2005 Jan 18.
9
Clustering of gene expression data: performance and similarity analysis.基因表达数据的聚类:性能与相似性分析
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.
10
Attribute clustering for grouping, selection, and classification of gene expression data.用于基因表达数据分组、选择和分类的属性聚类
IEEE/ACM Trans Comput Biol Bioinform. 2005 Apr-Jun;2(2):83-101. doi: 10.1109/TCBB.2005.17.

引用本文的文献

1
Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature.基于词汇语义和生物医学文献中句子频率的疾病因果关系提取
BMC Med Inform Decis Mak. 2017 May 18;17(Suppl 1):53. doi: 10.1186/s12911-017-0448-y.
2
Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends.整合文本挖掘、数据挖掘和网络分析以识别遗传性乳腺癌趋势。
BMC Res Notes. 2016 Apr 26;9:236. doi: 10.1186/s13104-016-2023-5.
3
Clinical decision support systems in myocardial perfusion imaging.
心肌灌注成像中的临床决策支持系统
J Nucl Cardiol. 2014 Jun;21(3):427-39; quiz 440. doi: 10.1007/s12350-014-9857-9. Epub 2014 Jan 31.
4
Semantic relations for interpreting DNA microarray data.用于解读DNA微阵列数据的语义关系。
AMIA Annu Symp Proc. 2009 Nov 14;2009:255-9.
5
A hybrid approach for biomarker discovery from microarray gene expression data for cancer classification.一种用于从微阵列基因表达数据中发现生物标志物以进行癌症分类的混合方法。
Cancer Inform. 2007 Feb 22;2:301-11.
6
Inference of gene pathways using mixture Bayesian networks.使用混合贝叶斯网络推断基因通路
BMC Syst Biol. 2009 May 19;3:54. doi: 10.1186/1752-0509-3-54.
7
Evaluation of a gene information summarization system by users during the analysis process of microarray datasets.在微阵列数据集分析过程中用户对基因信息汇总系统的评估。
BMC Bioinformatics. 2009 Feb 5;10 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-10-S2-S5.
8
Defrosting the digital library: bibliographic tools for the next generation web.解冻数字图书馆:面向下一代网络的书目工具
PLoS Comput Biol. 2008 Oct;4(10):e1000204. doi: 10.1371/journal.pcbi.1000204. Epub 2008 Oct 31.
9
Inferring modules of functionally interacting proteins using the Bond Energy Algorithm.使用键能算法推断功能相互作用蛋白质的模块。
BMC Bioinformatics. 2008 Jun 17;9:285. doi: 10.1186/1471-2105-9-285.
10
A document clustering and ranking system for exploring MEDLINE citations.一种用于探索MEDLINE引文的文档聚类和排序系统。
J Am Med Inform Assoc. 2007 Sep-Oct;14(5):651-61. doi: 10.1197/jamia.M2215. Epub 2007 Jun 28.