Tao Carson, Filannino Michele, Uzuner Özlem
Department of Information Science, State University of New York at Albany, NY, USA.
Department of Computer Science, State University of New York at Albany, NY, USA.
J Biomed Inform. 2017 Aug;72:60-66. doi: 10.1016/j.jbi.2017.07.002. Epub 2017 Jul 4.
In medical practices, doctors detail patients' care plan via discharge summaries written in the form of unstructured free texts, which among the others contain medication names and prescription information. Extracting prescriptions from discharge summaries is challenging due to the way these documents are written. Handwritten rules and medical gazetteers have proven to be useful for this purpose but come with limitations on performance, scalability, and generalizability. We instead present a machine learning approach to extract and organize medication names and prescription information into individual entries. Our approach utilizes word embeddings and tackles the task in two extraction steps, both of which are treated as sequence labeling problems. When evaluated on the 2009 i2b2 Challenge official benchmark set, the proposed approach achieves a horizontal phrase-level F1-measure of 0.864, which to the best of our knowledge represents an improvement over the current state-of-the-art.
在医疗实践中,医生通过以非结构化自由文本形式撰写的出院小结来详细说明患者的护理计划,其中包括药物名称和处方信息。由于这些文档的书写方式,从出院小结中提取处方具有挑战性。手写规则和医学地名词典已被证明在此方面有用,但在性能、可扩展性和通用性方面存在局限性。相反,我们提出了一种机器学习方法,用于将药物名称和处方信息提取并整理成单独的条目。我们的方法利用词嵌入,并通过两个提取步骤来处理该任务,这两个步骤均被视为序列标注问题。在2009年i2b2挑战赛官方基准数据集上进行评估时,所提出的方法在水平短语级别的F1值达到了0.864,据我们所知,这代表了相对于当前最先进技术的改进。