Stetson Peter D, Johnson Stephen B, Scotch Matthew, Hripcsak George
Departement of Medical Informatics, Columbia University, New York, NY, USA.
Proc AMIA Symp. 2002:742-6.
At Columbia-Presbyterian Medical Center, free-text "Signout" notes are typed into the electronic record by clinicians for the purpose of cross-coverage. We plan to "unlock" information about adverse events contained in these notes in a subsequent project using Natural Language Processing (NLP). To better understand the requirements for parsing, Signout notes were compared to other common medical notes (ambulatory clinic notes and discharge summaries) on a series of quantitative metrics. They are shorter (mean length 59.25 words vs. 144.11 and 340.85 for ambulatory and discharge notes respectively) and use more abbreviations (26.88% vs. 20.07% and 3.57%). Despite being terser, Signout notes use less ambiguous abbreviations (8.34% vs. 9.09% and 18.02%). Differences were found using Relative Entropy and Squared Chi-square Distance in a novel fashion to compare these medical corpora. Signout notes appear to constitute a unique sublanguage of medicine. The implications for parsing free-text cross-coverage notes into coded medical data are discussed.
在哥伦比亚长老会医学中心,临床医生为进行交叉交接,会将自由文本形式的“交班”记录录入电子病历。我们计划在后续项目中使用自然语言处理(NLP)技术“解锁”这些记录中包含的不良事件信息。为了更好地理解解析的要求,我们将交班记录与其他常见的医学记录(门诊病历和出院小结)在一系列定量指标上进行了比较。交班记录篇幅较短(平均长度为59.25个单词,而门诊病历和出院小结分别为144.11个和340.85个单词),且使用的缩写更多(分别为26.88%、20.07%和3.57%)。尽管交班记录较为简洁,但使用的含义不明确的缩写较少(分别为8.34%、9.09%和18.02%)。我们以一种新颖的方式使用相对熵和卡方距离平方来比较这些医学语料库,发现了其中的差异。交班记录似乎构成了医学中一种独特的子语言。本文讨论了将自由文本形式的交叉交接记录解析为编码医学数据的意义。