基于病理报告中的句子子图挖掘进行自动淋巴瘤分类。

Automatic lymphoma classification with sentence subgraph mining from pathology reports.

机构信息

Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Cambridge, Massachusetts, USA.

出版信息

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):824-32. doi: 10.1136/amiajnl-2013-002443. Epub 2014 Jan 15.

DOI:10.1136/amiajnl-2013-002443

PMID:24431333

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4147603/

Abstract

OBJECTIVE

Pathology reports are rich in narrative statements that encode a complex web of relations among medical concepts. These relations are routinely used by doctors to reason on diagnoses, but often require hand-crafted rules or supervised learning to extract into prespecified forms for computational disease modeling. We aim to automatically capture relations from narrative text without supervision.

METHODS

We design a novel framework that translates sentences into graph representations, automatically mines sentence subgraphs, reduces redundancy in mined subgraphs, and automatically generates subgraph features for subsequent classification tasks. To ensure meaningful interpretations over the sentence graphs, we use the Unified Medical Language System Metathesaurus to map token subsequences to concepts, and in turn sentence graph nodes. We test our system with multiple lymphoma classification tasks that together mimic the differential diagnosis by a pathologist. To this end, we prevent our classifiers from looking at explicit mentions or synonyms of lymphomas in the text.

RESULTS AND CONCLUSIONS

We compare our system with three baseline classifiers using standard n-grams, full MetaMap concepts, and filtered MetaMap concepts. Our system achieves high F-measures on multiple binary classifications of lymphoma (Burkitt lymphoma, 0.8; diffuse large B-cell lymphoma, 0.909; follicular lymphoma, 0.84; Hodgkin lymphoma, 0.912). Significance tests show that our system outperforms all three baselines. Moreover, feature analysis identifies subgraph features that contribute to improved performance; these features agree with the state-of-the-art knowledge about lymphoma classification. We also highlight how these unsupervised relation features may provide meaningful insights into lymphoma classification.

摘要

目的

病理报告中富含叙述性陈述，这些陈述编码了医学概念之间复杂的关系网络。这些关系通常被医生用于诊断推理，但通常需要手工规则或监督学习才能提取为预定义形式，以便进行计算疾病建模。我们旨在自动从叙述性文本中捕获关系，而无需监督。

方法

我们设计了一个新颖的框架，该框架将句子转换为图表示形式，自动挖掘句子子图，减少挖掘出的子图中的冗余，并自动生成子图特征，以用于后续的分类任务。为了确保对句子图进行有意义的解释，我们使用统一医学语言系统元词表将标记子序列映射到概念，进而映射到句子图节点。我们使用多个淋巴瘤分类任务来测试我们的系统，这些任务共同模拟病理学家的鉴别诊断。为此，我们防止分类器在文本中查看淋巴瘤的显式提及或同义词。

结果与结论

我们使用标准 n-gram、完整的 MetaMap 概念和过滤后的 MetaMap 概念，将我们的系统与三个基线分类器进行比较。我们的系统在多个淋巴瘤的二元分类（伯基特淋巴瘤，0.8；弥漫性大 B 细胞淋巴瘤，0.909；滤泡性淋巴瘤，0.84；霍奇金淋巴瘤，0.912）中取得了较高的 F 度量值。显著性检验表明，我们的系统优于所有三个基线。此外，特征分析确定了对子图特征的贡献，这些特征提高了性能；这些特征与关于淋巴瘤分类的最新知识一致。我们还强调了这些无监督关系特征如何为淋巴瘤分类提供有意义的见解。

相似文献

Automatic lymphoma classification with sentence subgraph mining from pathology reports.

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):824-32. doi: 10.1136/amiajnl-2013-002443. Epub 2014 Jan 15.

Subgraph augmented non-negative tensor factorization (SANTF) for modeling clinical narrative text.

J Am Med Inform Assoc. 2015 Sep;22(5):1009-19. doi: 10.1093/jamia/ocv016. Epub 2015 Apr 9.

BioEGRE: a linguistic topology enhanced method for biomedical relation extraction based on BioELECTRA and graph pointer neural network.

BMC Bioinformatics. 2023 Dec 19;24(1):486. doi: 10.1186/s12859-023-05601-9.

Context-driven automatic subgraph creation for literature-based discovery.

J Biomed Inform. 2015 Apr;54:141-57. doi: 10.1016/j.jbi.2015.01.014. Epub 2015 Feb 7.

Use of "off-the-shelf" information extraction algorithms in clinical informatics: A feasibility study of MetaMap annotation of Italian medical notes.

J Biomed Inform. 2016 Oct;63:22-32. doi: 10.1016/j.jbi.2016.07.017. Epub 2016 Jul 18.

Exploiting graph kernels for high performance biomedical relation extraction.

J Biomed Semantics. 2018 Jan 30;9(1):7. doi: 10.1186/s13326-017-0168-3.

Application of Adaptive Neural Network Algorithm Model in English Text Analysis.

Comput Intell Neurosci. 2022 May 26;2022:4866531. doi: 10.1155/2022/4866531. eCollection 2022.

Automated Classification of Selected Data Elements from Free-text Diagnostic Reports for Clinical Research.

Methods Inf Med. 2016 Aug 5;55(4):373-80. doi: 10.3414/ME15-02-0019. Epub 2016 Jul 13.

Portable automatic text classification for adverse drug reaction detection via multi-corpus training.

J Biomed Inform. 2015 Feb;53:196-207. doi: 10.1016/j.jbi.2014.11.002. Epub 2014 Nov 8.

A systematic approach for developing a corpus of patient reported adverse drug events: A case study for SSRI and SNRI medications.

J Biomed Inform. 2019 Feb;90:103091. doi: 10.1016/j.jbi.2018.12.005. Epub 2019 Jan 4.

引用本文的文献

A survey of NLP methods for oncology in the past decade with a focus on cancer registry applications.

Artif Intell Rev. 2025;58(10):314. doi: 10.1007/s10462-025-11316-5. Epub 2025 Jul 16.

Leveraging social media data to study disease and treatment characteristics of Hodgkin's lymphoma Using Natural Language Processing methods.

PLOS Digit Health. 2025 Mar 19;4(3):e0000765. doi: 10.1371/journal.pdig.0000765. eCollection 2025 Mar.

Leveraging natural language processing for efficient information extraction from breast cancer pathology reports: Single-institution study.

PLoS One. 2025 Feb 18;20(2):e0318726. doi: 10.1371/journal.pone.0318726. eCollection 2025.

Northwestern University resource and education development initiatives to advance collaborative artificial intelligence across the learning health system.

Learn Health Syst. 2024 Apr 15;8(3):e10417. doi: 10.1002/lrh2.10417. eCollection 2024 Jul.

An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports.

J Pathol Inform. 2022 Nov 8;13:100154. doi: 10.1016/j.jpi.2022.100154. eCollection 2022.

Machine Learning for Lung Cancer Diagnosis, Treatment, and Prognosis.

Genomics Proteomics Bioinformatics. 2022 Oct;20(5):850-866. doi: 10.1016/j.gpb.2022.11.003. Epub 2022 Dec 1.

A novel differential diagnosis algorithm for chronic lymphocytic leukemia using immunophenotyping with flow cytometry.

Hematol Transfus Cell Ther. 2023 Apr-Jun;45(2):176-181. doi: 10.1016/j.htct.2021.08.012. Epub 2021 Nov 29.

Integration of NLP2FHIR Representation with Deep Learning Models for EHR Phenotyping: A Pilot Study on Obesity Datasets.

AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:410-419. eCollection 2021.

Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective.

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2018 Dec;2018:461-466. doi: 10.1109/bibm.2018.8621521. Epub 2019 Jan 24.

Identifying Breast Cancer Distant Recurrences from Electronic Health Records Using Machine Learning.

J Healthc Inform Res. 2019;3(3):283-299. doi: 10.1007/s41666-019-00046-3. Epub 2019 Apr 8.

本文引用的文献

Indexed Pain Journals.

J Pain Palliat Care Pharmacother. 2008;22(1):45-46. doi: 10.1080/15360280801989377.

Towards comprehensive syntactic and semantic annotations of the clinical narrative.

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):922-30. doi: 10.1136/amiajnl-2012-001317. Epub 2013 Jan 25.

Fibrin-associated large B-cell lymphoma: part of the spectrum of cardiac lymphomas.

Am J Surg Pathol. 2012 Oct;36(10):1527-37. doi: 10.1097/PAS.0b013e31825d53b5.

Nodular lymphocyte-predominant hodgkin lymphoma with atypical T cells: a morphologic variant mimicking peripheral T-cell lymphoma.

Am J Surg Pathol. 2011 Nov;35(11):1666-78. doi: 10.1097/PAS.0b013e31822832de.

Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13. doi: 10.1136/jamia.2009.001560.

Semantic relations for problem-oriented medical records.

Artif Intell Med. 2010 Oct;50(2):63-73. doi: 10.1016/j.artmed.2010.05.006. Epub 2010 Jun 19.

Electronic medical records for discovery research in rheumatoid arthritis.

Arthritis Care Res (Hoboken). 2010 Aug;62(8):1120-7. doi: 10.1002/acr.20184.

B-cell lymphomas with concurrent IGH-BCL2 and MYC rearrangements are aggressive neoplasms with clinical and pathologic features distinct from Burkitt lymphoma and diffuse large B-cell lymphoma.

Am J Surg Pathol. 2010 Mar;34(3):327-40. doi: 10.1097/PAS.0b013e3181cd3aeb.

MedEx: a medication information extraction system for clinical narratives.

J Am Med Inform Assoc. 2010 Jan-Feb;17(1):19-24. doi: 10.1197/jamia.M3378.

Description of a rule-based system for the i2b2 challenge in natural language processing for clinical data.

J Am Med Inform Assoc. 2009 Jul-Aug;16(4):571-5. doi: 10.1197/jamia.M3083. Epub 2009 Apr 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于病理报告中的句子子图挖掘进行自动淋巴瘤分类。

Automatic lymphoma classification with sentence subgraph mining from pathology reports.

机构信息

Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Cambridge, Massachusetts, USA.

出版信息

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):824-32. doi: 10.1136/amiajnl-2013-002443. Epub 2014 Jan 15.

DOI:10.1136/amiajnl-2013-002443

PMID:24431333

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4147603/

Abstract

OBJECTIVE

METHODS

RESULTS AND CONCLUSIONS

摘要

基于病理报告中的句子子图挖掘进行自动淋巴瘤分类。

Automatic lymphoma classification with sentence subgraph mining from pathology reports.

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS AND CONCLUSIONS

目的

方法

结果与结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于病理报告中的句子子图挖掘进行自动淋巴瘤分类。

Automatic lymphoma classification with sentence subgraph mining from pathology reports.

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS AND CONCLUSIONS

目的

方法

结果与结论