Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Building 38A/7N707, Bethesda, MD 20894, USA.
J Biomed Inform. 2012 Aug;45(4):642-50. doi: 10.1016/j.jbi.2012.04.012. Epub 2012 May 3.
Clinical databases provide a rich source of data for answering clinical research questions. However, the variables recorded in clinical data systems are often identified by local, idiosyncratic, and sometimes redundant and/or ambiguous names (or codes) rather than unique, well-organized codes from standard code systems. This reality discourages research use of such databases, because researchers must invest considerable time in cleaning up the data before they can ask their first research question. Researchers at MIT developed MIMIC-II, a nearly complete collection of clinical data about intensive care patients. Because its data are drawn from existing clinical systems, it has many of the problems described above. In collaboration with the MIT researchers, we have begun a process of cleaning up the data and mapping the variable names and codes to LOINC codes. Our first step, which we describe here, was to map all of the laboratory test observations to LOINC codes. We were able to map 87% of the unique laboratory tests that cover 94% of the total number of laboratory tests results. Of the 13% of tests that we could not map, nearly 60% were due to test names whose real meaning could not be discerned and 29% represented tests that were not yet included in the LOINC table. These results suggest that LOINC codes cover most of laboratory tests used in critical care. We have delivered this work to the MIMIC-II researchers, who have included it in their standard MIMIC-II database release so that researchers who use this database in the future will not have to do this work.
临床数据库为回答临床研究问题提供了丰富的数据来源。然而,临床数据系统中记录的变量通常是由本地、特殊的、有时是冗余和/或模糊的名称(或代码)来标识的,而不是来自标准代码系统的独特、组织良好的代码。这种现实情况阻碍了对这些数据库的研究利用,因为研究人员必须投入大量时间清理数据,然后才能提出第一个研究问题。麻省理工学院的研究人员开发了 MIMIC-II,这是一个关于重症监护患者的几乎完整的临床数据集合。由于其数据来自现有的临床系统,因此它具有上述许多问题。我们与麻省理工学院的研究人员合作,开始了清理数据并将变量名称和代码映射到 LOINC 代码的过程。我们的第一步,如前所述,是将所有实验室测试观察结果映射到 LOINC 代码。我们能够映射 87%的独特实验室测试,涵盖了总实验室测试结果的 94%。在我们无法映射的 13%的测试中,近 60%是由于测试名称的实际含义无法辨别,29%代表尚未包含在 LOINC 表中的测试。这些结果表明,LOINC 代码涵盖了重症监护中使用的大多数实验室测试。我们已经将这项工作交付给 MIMIC-II 研究人员,他们已经将其包含在他们的标准 MIMIC-II 数据库版本中,以便将来使用该数据库的研究人员不必进行这项工作。