Dilling Thomas J
Department of Radiation Oncology, Moffitt Cancer Center.
Adv Radiat Oncol. 2020 Jul 13;5(6):1280-1285. doi: 10.1016/j.adro.2020.06.027. eCollection 2020 Nov-Dec.
Although many researchers talk about a "patient database," they typically are not referring to a database at all, but instead to a spreadsheet of curated facts about a cohort of patients. This article describes relational database systems and how they differ from spreadsheets. At their core, spreadsheets are only capable of describing one-to-one (1:1) relationships. However, this article demonstrates that clinical medical data encapsulate numerous one-to-many relationships. Consequently, spreadsheets are very inefficient relative to relational database systems, which gracefully manage such data. Databases provide other advantages, in that the data fields are "typed" (that is, they contain specific kinds of data). This prevents users from entering spurious data during data import. Because each record contains a "key," it becomes impossible to add duplicate information (ie, add the same patient twice). Databases store data in very efficient ways, minimizing space and memory requirements on the host system. Likewise, databases can be queried or manipulated using a highly complex language called SQL. Consequently, it becomes trivial to cull large amounts of data from a vast number of data fields on very precise subsets of patients. Databases can be quite large (terabytes or more in size), yet still are highly efficient to query. Consequently, with the explosion of data available in electronic health records and other data sources, databases become increasingly important to contain or order these data. Ultimately, this will enable the clinical researcher to perform artificial intelligence analyses across vast amounts of clinical data in a way heretofore impossible. This article provides initial guidance in terms of creating a relational database system.
尽管许多研究人员谈论“患者数据库”,但他们通常根本不是指数据库,而是指关于一组患者的经过整理的事实的电子表格。本文描述了关系数据库系统以及它们与电子表格的不同之处。从核心上讲,电子表格只能描述一对一(1:1)关系。然而,本文表明临床医疗数据包含众多一对多关系。因此,相对于能够很好地管理此类数据的关系数据库系统,电子表格效率非常低。数据库还有其他优势,即数据字段是“类型化的”(也就是说,它们包含特定类型的数据)。这可以防止用户在数据导入期间输入虚假数据。由于每条记录都包含一个“键”,因此不可能添加重复信息(即两次添加同一患者)。数据库以非常高效的方式存储数据,将主机系统上的空间和内存需求降至最低。同样,可以使用一种称为SQL的高度复杂语言对数据库进行查询或操作。因此,从大量患者的非常精确的子集中从大量数据字段中筛选大量数据变得轻而易举。数据库可以非常大(大小可达数TB或更多),但查询起来仍然非常高效。因此,随着电子健康记录和其他数据源中可用数据的爆炸式增长,数据库对于存储或整理这些数据变得越来越重要。最终,这将使临床研究人员能够以迄今为止不可能的方式对大量临床数据进行人工智能分析。本文提供了关于创建关系数据库系统的初步指导。