• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过对免疫组织化学研究摘要进行文本挖掘自动提取淋巴瘤中精确的蛋白质表达模式。

Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies.

作者信息

Chang Jia-Fu, Popescu Mihail, Arthur Gerald L

机构信息

MU Informatics Institute, University of Missouri, Columbia, USA.

出版信息

J Pathol Inform. 2013 Jul 31;4:20. doi: 10.4103/2153-3539.115880. eCollection 2013.

DOI:10.4103/2153-3539.115880
PMID:23967385
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3746413/
Abstract

BACKGROUND

In general, surgical pathology reviews report protein expression by tumors in a semi-quantitative manner, that is, -, -/+, +/-, +. At the same time, the experimental pathology literature provides multiple examples of precise expression levels determined by immunohistochemical (IHC) tissue examination of populations of tumors. Natural language processing (NLP) techniques enable the automated extraction of such information through text mining. We propose establishing a database linking quantitative protein expression levels with specific tumor classifications through NLP.

MATERIALS AND METHODS

Our method takes advantage of typical forms of representing experimental findings in terms of percentages of protein expression manifest by the tumor population under study. Characteristically, percentages are represented straightforwardly with the % symbol or as the number of positive findings of the total population. Such text is readily recognized using regular expressions and templates permitting extraction of sentences containing these forms for further analysis using grammatical structures and rule-based algorithms.

RESULTS

Our pilot study is limited to the extraction of such information related to lymphomas. We achieved a satisfactory level of retrieval as reflected in scores of 69.91% precision and 57.25% recall with an F-score of 62.95%. In addition, we demonstrate the utility of a web-based curation tool for confirming and correcting our findings.

CONCLUSIONS

The experimental pathology literature represents a rich source of pathobiological information, which has been relatively underutilized. There has been a combinatorial explosion of knowledge within the pathology domain as represented by increasing numbers of immunophenotypes and disease subclassifications. NLP techniques support practical text mining techniques for extracting this knowledge and organizing it in forms appropriate for pathology decision support systems.

摘要

背景

一般来说,外科病理学报告以半定量方式呈现肿瘤的蛋白质表达情况,即 -、-/+、+/-、+。同时,实验病理学文献提供了多个通过肿瘤群体的免疫组织化学(IHC)组织检查确定精确表达水平的例子。自然语言处理(NLP)技术能够通过文本挖掘自动提取此类信息。我们提议通过NLP建立一个将定量蛋白质表达水平与特定肿瘤分类相联系的数据库。

材料与方法

我们的方法利用了以所研究肿瘤群体中蛋白质表达百分比来表示实验结果的典型形式。通常,百分比直接用%符号表示,或者表示为总体阳性结果的数量。使用正则表达式和模板可以很容易地识别此类文本,从而提取包含这些形式的句子,以便使用语法结构和基于规则的算法进行进一步分析。

结果

我们的初步研究仅限于提取与淋巴瘤相关的此类信息。我们取得了令人满意的检索水平,精确率为69.91%,召回率为57.25%,F值为62.95%。此外,我们展示了一个基于网络的管理工具在确认和纠正我们的发现方面的效用。

结论

实验病理学文献是病理生物学信息的丰富来源,但相对未得到充分利用。随着免疫表型和疾病亚分类数量的增加,病理学领域的知识出现了组合式爆炸。NLP技术支持实用的文本挖掘技术,用于提取这些知识并将其组织成适合病理决策支持系统的形式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/7c21e7099efb/JPI-4-20-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/98217a951401/JPI-4-20-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/5b628e771904/JPI-4-20-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/e80b91141006/JPI-4-20-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/3ad80f5453c7/JPI-4-20-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/52496146f4ed/JPI-4-20-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/988e599f159b/JPI-4-20-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/812c8bf89b6a/JPI-4-20-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/7c21e7099efb/JPI-4-20-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/98217a951401/JPI-4-20-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/5b628e771904/JPI-4-20-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/e80b91141006/JPI-4-20-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/3ad80f5453c7/JPI-4-20-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/52496146f4ed/JPI-4-20-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/988e599f159b/JPI-4-20-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/812c8bf89b6a/JPI-4-20-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c0/3746413/7c21e7099efb/JPI-4-20-g011.jpg

相似文献

1
Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies.通过对免疫组织化学研究摘要进行文本挖掘自动提取淋巴瘤中精确的蛋白质表达模式。
J Pathol Inform. 2013 Jul 31;4:20. doi: 10.4103/2153-3539.115880. eCollection 2013.
2
miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.miRiaD:一种用于检测微小RNA与疾病关联的文本挖掘工具。
J Biomed Semantics. 2016 Apr 29;7(1):9. doi: 10.1186/s13326-015-0044-y.
3
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道,使用自然语言处理提取和标准化非结构化临床数据。
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.
4
Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction.知识作者:促进用户驱动的领域内容开发,以支持临床信息提取。
J Biomed Semantics. 2016 Jun 23;7(1):42. doi: 10.1186/s13326-016-0086-9.
5
Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation.蛋白质亚细胞定位的半自动管理:一种基于文本挖掘的基因本体论(GO)细胞组分管理方法。
BMC Bioinformatics. 2009 Jul 21;10:228. doi: 10.1186/1471-2105-10-228.
6
Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.文本挖掘有助于数据库管理——从生物医学文献中提取突变与疾病的关联。
BMC Bioinformatics. 2015 Jun 6;16:185. doi: 10.1186/s12859-015-0609-x.
7
The eFIP system for text mining of protein interaction networks of phosphorylated proteins.基于磷酸化蛋白质相互作用网络的文本挖掘的 eFIP 系统。
Database (Oxford). 2012 Dec 5;2012:bas044. doi: 10.1093/database/bas044. Print 2012.
8
Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text.自动检测在线社区文本自然语言处理工具中的故障。
J Med Internet Res. 2015 Aug 31;17(8):e212. doi: 10.2196/jmir.4612.
9
Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing.利用自然语言处理技术从马来亚大学医学中心的叙述性病理报告中自动生成概要报告
Diagnostics (Basel). 2022 Apr 1;12(4):879. doi: 10.3390/diagnostics12040879.
10
[A customized method for information extraction from unstructured text data in the electronic medical records].[一种从电子病历非结构化文本数据中提取信息的定制方法]
Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263.

引用本文的文献

1
Natural Language Processing Applications in the Clinical Neurosciences: A Machine Learning Augmented Systematic Review.自然语言处理在临床神经科学中的应用:机器学习增强的系统综述。
Acta Neurochir Suppl. 2022;134:277-289. doi: 10.1007/978-3-030-85292-4_32.
2
Automated detection of sudden unexpected death in epilepsy risk factors in electronic medical records using natural language processing.使用自然语言处理技术在电子病历中自动检测癫痫风险因素的突发性意外死亡。
Epilepsia. 2019 Jun;60(6):1209-1220. doi: 10.1111/epi.15966. Epub 2019 May 21.
3
Differential gene expression in disease: a comparison between high-throughput studies and the literature.

本文引用的文献

1
The feasibility of using natural language processing to extract clinical information from breast pathology reports.利用自然语言处理从乳腺病理报告中提取临床信息的可行性。
J Pathol Inform. 2012;3:23. doi: 10.4103/2153-3539.97788. Epub 2012 Jun 30.
2
Gastrointestinal stromal tumor: advances in diagnosis and management.胃肠道间质瘤:诊断与治疗的进展。
Arch Pathol Lab Med. 2011 Oct;135(10):1298-310. doi: 10.5858/arpa.2011-0022-RA.
3
Natural language processing: an introduction.自然语言处理:入门。
疾病中的差异基因表达:高通量研究与文献的比较
BMC Med Genomics. 2017 Oct 11;10(1):59. doi: 10.1186/s12920-017-0293-y.
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):544-51. doi: 10.1136/amiajnl-2011-000464.
4
An immunohistochemical panel to differentiate metastatic breast carcinoma to skin from primary sweat gland carcinomas with a review of the literature.鉴别转移性乳腺皮肤癌与原发性汗腺癌的免疫组化标志物:文献复习
Arch Pathol Lab Med. 2011 Aug;135(8):975-83. doi: 10.5858/2009-0445-OAR2.
5
Data mining in healthcare and biomedicine: a survey of the literature.医疗保健和生物医学中的数据挖掘:文献综述。
J Med Syst. 2012 Aug;36(4):2431-48. doi: 10.1007/s10916-011-9710-5. Epub 2011 May 3.
6
Concept Discovery for Pathology Reports using an N-gram Model.使用N元语法模型进行病理报告的概念发现
Summit Transl Bioinform. 2010 Mar 1;2010:43-7.
7
KID--an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes.KID 是一种快速高效的文本挖掘算法,用于自动生成包含酶动力学信息的数据库。
BMC Bioinformatics. 2010 Jul 13;11:375. doi: 10.1186/1471-2105-11-375.
8
Analysis of biological processes and diseases using text mining approaches.使用文本挖掘方法分析生物过程和疾病。
Methods Mol Biol. 2010;593:341-82. doi: 10.1007/978-1-60327-194-3_16.
9
The Universal Protein Resource (UniProt) in 2010.2010 年的通用蛋白质资源(UniProt)。
Nucleic Acids Res. 2010 Jan;38(Database issue):D142-8. doi: 10.1093/nar/gkp846. Epub 2009 Oct 20.
10
Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model.从病理报告中自动提取癌症疾病特征到疾病知识表示模型中。
J Biomed Inform. 2009 Oct;42(5):937-49. doi: 10.1016/j.jbi.2008.12.005. Epub 2008 Dec 27.