Mu Youqing, Tizhoosh Hamid R, Tayebi Rohollah Moosavi, Ross Catherine, Sur Monalisa, Leber Brian, Campbell Clinton J V
McMaster University, Hamilton, ON Canada.
Kimia Lab, University of Waterloo, Waterloo, ON Canada.
Commun Med (Lond). 2021 Jul 5;1:11. doi: 10.1038/s43856-021-00008-0. eCollection 2021.
Pathology synopses consist of semi-structured or unstructured text summarizing visual information by observing human tissue. Experts write and interpret these synopses with high domain-specific knowledge to extract tissue semantics and formulate a diagnosis in the context of ancillary testing and clinical information. The limited number of specialists available to interpret pathology synopses restricts the utility of the inherent information. Deep learning offers a tool for information extraction and automatic feature generation from complex datasets.
Using an active learning approach, we developed a set of semantic labels for bone marrow aspirate pathology synopses. We then trained a transformer-based deep-learning model to map these synopses to one or more semantic labels, and extracted learned embeddings (i.e., meaningful attributes) from the model's hidden layer.
Here we demonstrate that with a small amount of training data, a transformer-based natural language model can extract embeddings from pathology synopses that capture diagnostically relevant information. On average, these embeddings can be used to generate semantic labels mapping patients to probable diagnostic groups with a micro-average F1 score of 0.779 Â ± 0.025.
We provide a generalizable deep learning model and approach to unlock the semantic information inherent in pathology synopses toward improved diagnostics, biodiscovery and AI-assisted computational pathology.
病理学概要由通过观察人体组织来总结视觉信息的半结构化或非结构化文本组成。专家凭借高度的领域特定知识撰写并解读这些概要,以提取组织语义并在辅助检测和临床信息的背景下做出诊断。能够解读病理学概要的专家数量有限,限制了固有信息的效用。深度学习提供了一种从复杂数据集中提取信息和自动生成特征的工具。
我们采用主动学习方法,为骨髓穿刺病理学概要开发了一组语义标签。然后,我们训练了一个基于Transformer的深度学习模型,将这些概要映射到一个或多个语义标签,并从模型的隐藏层中提取学习到的嵌入(即有意义的属性)。
在此我们证明,使用少量训练数据,基于Transformer的自然语言模型可以从病理学概要中提取能够捕获诊断相关信息的嵌入。平均而言,这些嵌入可用于生成将患者映射到可能诊断组的语义标签,微平均F1分数为0.779±0.025。
我们提供了一种可推广的深度学习模型和方法,以解锁病理学概要中固有的语义信息,用于改进诊断、生物发现和人工智能辅助的计算病理学。