O'Connor R W, Miller A K
Northrop Services, Inc., Environmental Sciences, Research Triangle Park, North Carolina 27709.
Toxicology. 1987 Dec 1;47(1-2):109-18. doi: 10.1016/0300-483x(87)90163-6.
Traditional definitions of data quality deal primarily with individual data sets and the data collection process. Today's standards for ensuring data quality have not changed with respect to the desired results, but have simply been expanded to take advantage of modern technology. Computers are used to acquire, review, store, analyze, and report data. Because each of these steps can be automated, the need for human intervention and manual review is minimized. As a result, the potential for invalid data to reach the data analysis stage has increased significantly. To reduce this potential, efforts must be devoted to developing automated procedures that cover every conceivable validation possibility. Relationships between data and data sets must be well defined [1], and data base support that facilitates ready access to the data for the purpose of analysis must be provided. For small data sets, automation may therefore be impractical; but for large, interrelated data sets, automation is highly desirable. Computer automation has therefore expanded the traditional concept of ensuring data quality to include a complex array of interrelated tasks that must be properly managed to achieve the desired results.
传统的数据质量定义主要涉及单个数据集和数据收集过程。如今,确保数据质量的标准在期望的结果方面并未改变,只是简单地进行了扩展,以利用现代技术。计算机用于获取、审查、存储、分析和报告数据。由于这些步骤中的每一个都可以自动化,因此对人工干预和人工审查的需求降至最低。结果,无效数据进入数据分析阶段的可能性显著增加。为了降低这种可能性,必须致力于开发涵盖每一种可想象的验证可能性的自动化程序。数据与数据集之间的关系必须明确界定[1],并且必须提供便于为分析目的而随时访问数据的数据库支持。因此,对于小型数据集,自动化可能不切实际;但对于大型的、相互关联的数据集,自动化是非常可取的。因此,计算机自动化扩展了确保数据质量的传统概念,使其包括一系列复杂的相互关联的任务,必须对这些任务进行妥善管理才能实现期望的结果。