Takahiro Suzuki, Shunsuke Doi, Yutaka Hatakeyama, Masayuki Honda, Yasushi Matsumura, Gen Shimada, Mitsuhiro Takasaki, Shusaku Tsumoto, Hideto Yokoi, Katsuhiko Takabayashi
Department of Medical Informatics and Management, Chiba University Hospital.
Department of Welfare and Medical Intelligence, Chiba University Hospital.
Stud Health Technol Inform. 2015;216:1120.
We performed the multi-year project to collect discharge summary from multiple hospitals and made the big text database to build a common document vector space, and developed various applications. We extracted 243,907 discharge summaries from seven hospitals. There was a difference in term structure and number of terms between the hospitals, however the differences by disease were similar. We built the vector space using TF-IDF method. We performed a cross-match analysis of DPC selection among seven hospitals. About 80% cases were correctly matched. The use of model data of other hospitals reduced selection rate to around 10%; however, integrated model data from all hospitals restored the selection rate.
我们开展了一个多年项目,从多家医院收集出院小结,构建了大型文本数据库以建立通用文档向量空间,并开发了各种应用程序。我们从七家医院提取了243,907份出院小结。各医院之间在术语结构和术语数量上存在差异,但按疾病分类的差异相似。我们使用TF-IDF方法构建了向量空间。我们对七家医院之间的诊断相关分组(DPC)选择进行了交叉匹配分析。约80%的病例匹配正确。使用其他医院的模型数据会使选择率降至10%左右;然而,整合所有医院的模型数据可恢复选择率。