Connell F A, Diehr P, Hart L G
Annu Rev Public Health. 1987;8:51-74. doi: 10.1146/annurev.pu.08.050187.000411.
The growing number of large health data bases available represents a valuable resource for health care research. Many available data bases, however, have subtle and/or complex defects in their design as well as in the quality of the data themselves. The apparent ease and economy of using pre-collected data cannot eliminate the need for careful selection, examination, and analysis of these data. Existing documentation should be critically reviewed to assess the appropriateness of the data base for its intended use. Once in hand, the completeness and coding of the data should be examined in detail before attempting to test hypotheses. In conducting data analysis, the investigator must be aware of the potential problems related to the size of the data base, the unit of analysis, and the sampling strategy--particularly if sampling involved stratification or clustering. Awareness of the potential pitfalls inherent in the use of large health data bases can help prevent many problems and disappointments, as well as improve the validity and efficiency of statistical analysis.
越来越多的大型健康数据库成为医疗保健研究的宝贵资源。然而,许多现有数据库在设计以及数据本身的质量方面存在细微和/或复杂的缺陷。使用预先收集的数据表面上的便捷性和经济性并不能消除对这些数据进行仔细选择、审查和分析的必要性。应严格审查现有文档,以评估数据库用于其预期用途的适用性。一旦拿到数据,在尝试检验假设之前,应详细检查数据的完整性和编码。在进行数据分析时,研究人员必须意识到与数据库规模、分析单位和抽样策略相关的潜在问题——特别是如果抽样涉及分层或聚类。意识到使用大型健康数据库所固有的潜在陷阱有助于避免许多问题和失望,同时提高统计分析的有效性和效率。