Yang Zhenjun, Matsumura Yasushi, Kuwata Shigeki, Kusuoka Hideo, Takeda Hiroshi
Graduate School of Medicine, Department of Medical Information Science, Osaka University, Yamadaoka, Suita, Osaka, Japan.
J Med Syst. 2003 Jun;27(3):271-82. doi: 10.1023/a:1022527528856.
We proposed a suitable method to search similar cases from the laboratory test results database, whose data are basically numerical and ordinal data. We transformed raw data into ordinal ranks and into new scores lying between 0 and 1, then calculated the Mahalanobis distances as a similarity measure. We used 3000 cases of blood count data. In 100 sample cases, 95% of the most similar 20 cases obtained by our method were included in those by the criterion (Mahalanobis distances calculated from raw data). Next, we applied our method to the data relevant to thyroid diseases. In 96 sample cases, the most similar 10 cases were retrieved from 1655 cases. The diagnoses were consistent with that of the sample cases in 32.4%. When we used Euclidean distance, the result worsened to 27.7%. Our method proved to be suitable in our attempt to identify similar cases in complicated laboratory test data.