Suppr超能文献

通过对免疫组织化学研究摘要进行文本挖掘自动提取淋巴瘤中精确的蛋白质表达模式。

Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies.

作者信息

Chang Jia-Fu, Popescu Mihail, Arthur Gerald L

机构信息

MU Informatics Institute, University of Missouri, Columbia, USA.

出版信息

J Pathol Inform. 2013 Jul 31;4:20. doi: 10.4103/2153-3539.115880. eCollection 2013.

Abstract

BACKGROUND

In general, surgical pathology reviews report protein expression by tumors in a semi-quantitative manner, that is, -, -/+, +/-, +. At the same time, the experimental pathology literature provides multiple examples of precise expression levels determined by immunohistochemical (IHC) tissue examination of populations of tumors. Natural language processing (NLP) techniques enable the automated extraction of such information through text mining. We propose establishing a database linking quantitative protein expression levels with specific tumor classifications through NLP.

MATERIALS AND METHODS

Our method takes advantage of typical forms of representing experimental findings in terms of percentages of protein expression manifest by the tumor population under study. Characteristically, percentages are represented straightforwardly with the % symbol or as the number of positive findings of the total population. Such text is readily recognized using regular expressions and templates permitting extraction of sentences containing these forms for further analysis using grammatical structures and rule-based algorithms.

RESULTS

Our pilot study is limited to the extraction of such information related to lymphomas. We achieved a satisfactory level of retrieval as reflected in scores of 69.91% precision and 57.25% recall with an F-score of 62.95%. In addition, we demonstrate the utility of a web-based curation tool for confirming and correcting our findings.

CONCLUSIONS

The experimental pathology literature represents a rich source of pathobiological information, which has been relatively underutilized. There has been a combinatorial explosion of knowledge within the pathology domain as represented by increasing numbers of immunophenotypes and disease subclassifications. NLP techniques support practical text mining techniques for extracting this knowledge and organizing it in forms appropriate for pathology decision support systems.

摘要

背景

一般来说,外科病理学报告以半定量方式呈现肿瘤的蛋白质表达情况,即 -、-/+、+/-、+。同时,实验病理学文献提供了多个通过肿瘤群体的免疫组织化学(IHC)组织检查确定精确表达水平的例子。自然语言处理(NLP)技术能够通过文本挖掘自动提取此类信息。我们提议通过NLP建立一个将定量蛋白质表达水平与特定肿瘤分类相联系的数据库。

材料与方法

我们的方法利用了以所研究肿瘤群体中蛋白质表达百分比来表示实验结果的典型形式。通常,百分比直接用%符号表示,或者表示为总体阳性结果的数量。使用正则表达式和模板可以很容易地识别此类文本,从而提取包含这些形式的句子,以便使用语法结构和基于规则的算法进行进一步分析。

结果

我们的初步研究仅限于提取与淋巴瘤相关的此类信息。我们取得了令人满意的检索水平,精确率为69.91%,召回率为57.25%,F值为62.95%。此外,我们展示了一个基于网络的管理工具在确认和纠正我们的发现方面的效用。

结论

实验病理学文献是病理生物学信息的丰富来源,但相对未得到充分利用。随着免疫表型和疾病亚分类数量的增加,病理学领域的知识出现了组合式爆炸。NLP技术支持实用的文本挖掘技术,用于提取这些知识并将其组织成适合病理决策支持系统的形式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/98217a951401/JPI-4-20-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验