Suppr超能文献

在生物医学文本中识别组蛋白修饰以支持表观基因组学研究。

Identification of histone modifications in biomedical text for supporting epigenomic research.

作者信息

Kolárik Corinna, Klinger Roman, Hofmann-Apitius Martin

机构信息

Department of Bioinformatics, Fraunhofer Institute Algorithms and Scientific Computing (SCAI) Schloss Birlinghoven, D-53754 Sankt Augustin, Germany.

出版信息

BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S28. doi: 10.1186/1471-2105-10-S1-S28.

Abstract

BACKGROUND

Posttranslational modifications of histones influence the structure of chromatine and in such a way take part in the regulation of gene expression. Certain histone modification patterns, distributed over the genome, are connected to cell as well as tissue differentiation and to the adaption of organisms to their environment. Abnormal changes instead influence the development of disease states like cancer. The regulation mechanisms for modifying histones and its functionalities are the subject of epigenomics investigation and are still not completely understood. Text provides a rich resource of knowledge on epigenomics and modifications of histones in particular. It contains information about experimental studies, the conditions used, and results. To our knowledge, no approach has been published so far for identifying histone modifications in text.

RESULTS

We have developed an approach for identifying histone modifications in biomedical literature with Conditional Random Fields (CRF) and for resolving the recognized histone modification term variants by term standardization. For the term identification F1 measures of 0.84 by 10-fold cross-validation on the training corpus and 0.81 on an independent test corpus have been obtained. The standardization enabled the correct transformation of 96% of the terms from training and 98% from test the corpus. Due to the lack of terminologies exhaustively covering specific histone modification types, we developed a histone modification term hierarchy for use in a semantic text retrieval system.

CONCLUSION

The developed approach highly improves the retrieval of articles describing histone modifications. Since text contains context information about performed studies and experiments, the identification of histone modifications is the basis for supporting literature-based knowledge discovery and hypothesis generation to accelerate epigenomic research.

摘要

背景

组蛋白的翻译后修饰会影响染色质结构,从而参与基因表达的调控。某些分布于基因组的组蛋白修饰模式与细胞及组织分化以及生物体对环境的适应性相关。相反,异常变化会影响诸如癌症等疾病状态的发展。组蛋白修饰的调控机制及其功能是表观基因组学研究的主题,目前仍未完全明晰。文本提供了关于表观基因组学,尤其是组蛋白修饰的丰富知识资源。它包含有关实验研究、所用条件及结果的信息。据我们所知,目前尚未有在文本中识别组蛋白修饰的方法被发表。

结果

我们开发了一种利用条件随机场(CRF)在生物医学文献中识别组蛋白修饰,并通过术语标准化解决已识别的组蛋白修饰术语变体的方法。在训练语料库上通过10折交叉验证获得的术语识别F1值为0.84,在独立测试语料库上为0.81。标准化使得训练语料库中96%的术语以及测试语料库中98%的术语能够正确转换。由于缺乏详尽涵盖特定组蛋白修饰类型的术语表,我们开发了一个组蛋白修饰术语层次结构,用于语义文本检索系统。

结论

所开发的方法极大地改进了描述组蛋白修饰的文章的检索。由于文本包含有关所进行研究和实验的上下文信息,组蛋白修饰的识别是支持基于文献的知识发现和假设生成以加速表观基因组学研究的基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1bb/2648793/b9fbf3d1e70b/1471-2105-10-S1-S28-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验