University at Albany, State University of New York, 135 Western Ave., Draper 114A, Albany, NY 12222, USA.
Artif Intell Med. 2010 Oct;50(2):63-73. doi: 10.1016/j.artmed.2010.05.006. Epub 2010 Jun 19.
We describe semantic relation (SR) classification on medical discharge summaries. We focus on relations targeted to the creation of problem-oriented records. Thus, we define relations that involve the medical problems of patients.
We represent patients' medical problems with their diseases and symptoms. We study the relations of patients' problems with each other and with concepts that are identified as tests and treatments. We present an SR classifier that studies a corpus of patient records one sentence at a time. For all pairs of concepts that appear in a sentence, this SR classifier determines the relations between them. In doing so, the SR classifier takes advantage of surface, lexical, and syntactic features and uses these features as input to a support vector machine. We apply our SR classifier to two sets of medical discharge summaries, one obtained from the Beth Israel-Deaconess Medical Center (BIDMC), Boston, MA and the other from Partners Healthcare, Boston, MA.
On the BIDMC corpus, our SR classifier achieves micro-averaged F-measures that range from 74% to 95% on the various relation types. On the Partners corpus, the micro-averaged F-measures on the various relation types range from 68% to 91%. Our experiments show that lexical features (in particular, tokens that occur between candidate concepts, which we refer to as inter-concept tokens) are very informative for relation classification in medical discharge summaries. Using only the inter-concept tokens in the corpus, our SR classifier can recognize 84% of the relations in the BIDMC corpus and 72% of the relations in the Partners corpus.
These results are promising for semantic indexing of medical records. They imply that we can take advantage of lexical patterns in discharge summaries for relation classification at a sentence level.
我们描述了对医疗出院小结进行语义关系(SR)分类。我们专注于针对创建面向问题的记录的关系。因此,我们定义了涉及患者医疗问题的关系。
我们用疾病和症状来表示患者的医疗问题。我们研究患者问题之间以及与被识别为测试和治疗的概念之间的关系。我们提出了一种 SR 分类器,该分类器一次研究一个患者记录的句子。对于出现在句子中的所有概念对,该 SR 分类器确定它们之间的关系。在这样做的过程中,SR 分类器利用了表面、词汇和句法特征,并将这些特征作为输入提供给支持向量机。我们将我们的 SR 分类器应用于两个医疗出院小结集,一个来自马萨诸塞州波士顿的 Beth Israel-Deaconess Medical Center(BIDMC),另一个来自马萨诸塞州波士顿的 Partners Healthcare。
在 BIDMC 语料库上,我们的 SR 分类器在各种关系类型上的微平均 F1 分数范围为 74%至 95%。在 Partners 语料库上,各种关系类型的微平均 F1 分数范围为 68%至 91%。我们的实验表明,词汇特征(特别是出现在候选概念之间的标记,我们称之为概念间标记)对于医疗出院小结中的关系分类非常有用。仅使用语料库中的概念间标记,我们的 SR 分类器可以识别出 BIDMC 语料库中 84%的关系和 Partners 语料库中 72%的关系。
这些结果为医疗记录的语义索引提供了希望。它们意味着我们可以利用出院小结中的词汇模式来进行句子级别的关系分类。