Suppr超能文献

用于从生物医学文本中发现基因相互作用及其上下文信息的序列模式挖掘

Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts.

作者信息

Cellier Peggy, Charnois Thierry, Plantevit Marc, Rigotti Christophe, Crémilleux Bruno, Gandrillon Olivier, Kléma Jiří, Manguin Jean-Luc

机构信息

INSA de Rennes, IRISA, UMR6074, Rennes, F-35042 France.

Université de Paris 13, LIPN, UMR7030, Villetaneuse, F-93430 France.

出版信息

J Biomed Semantics. 2015 May 18;6:27. doi: 10.1186/s13326-015-0023-3. eCollection 2015.

Abstract

BACKGROUND

Discovering gene interactions and their characterizations from biological text collections is a crucial issue in bioinformatics. Indeed, text collections are large and it is very difficult for biologists to fully take benefit from this amount of knowledge. Natural Language Processing (NLP) methods have been applied to extract background knowledge from biomedical texts. Some of existing NLP approaches are based on handcrafted rules and thus are time consuming and often devoted to a specific corpus. Machine learning based NLP methods, give good results but generate outcomes that are not really understandable by a user.

RESULTS

We take advantage of an hybridization of data mining and natural language processing to propose an original symbolic method to automatically produce patterns conveying gene interactions and their characterizations. Therefore, our method not only allows gene interactions but also semantics information on the extracted interactions (e.g., modalities, biological contexts, interaction types) to be detected. Only limited resource is required: the text collection that is used as a training corpus. Our approach gives results comparable to the results given by state-of-the-art methods and is even better for the gene interaction detection in AIMed.

CONCLUSIONS

Experiments show how our approach enables to discover interactions and their characterizations. To the best of our knowledge, there is few methods that automatically extract the interactions and also associated semantics information. The extracted gene interactions from PubMed are available through a simple web interface at https://bingotexte.greyc.fr/. The software is available at https://bingo2.greyc.fr/?q=node/22.

摘要

背景

从生物文本集合中发现基因相互作用及其特征是生物信息学中的一个关键问题。实际上,文本集合规模庞大,生物学家要充分利用这些知识非常困难。自然语言处理(NLP)方法已被应用于从生物医学文本中提取背景知识。一些现有的NLP方法基于手工制定的规则,因此耗时且通常只适用于特定语料库。基于机器学习的NLP方法虽然能取得良好结果,但生成的结果用户难以真正理解。

结果

我们利用数据挖掘与自然语言处理的结合,提出一种原创的符号方法,以自动生成传达基因相互作用及其特征的模式。因此,我们的方法不仅能检测基因相互作用,还能检测关于提取的相互作用的语义信息(例如,方式、生物背景、相互作用类型)。只需要有限的资源:用作训练语料库的文本集合。我们的方法给出的结果与现有最先进方法的结果相当,在AIMed中的基因相互作用检测方面甚至更好。

结论

实验表明我们的方法能够发现相互作用及其特征。据我们所知,很少有方法能自动提取相互作用以及相关的语义信息。从PubMed中提取的基因相互作用可通过https://bingotexte.greyc.fr/的简单网页界面获取。该软件可在https://bingo2.greyc.fr/?q=node/22获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b7e/4436157/94eea0785a63/13326_2015_23_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验