Renaissance Computing Institute, University of North Carolina at Chapel Hill Chapel Hill, NC, USA.
Front Psychiatry. 2011 Jul 22;2:47. doi: 10.3389/fpsyt.2011.00047. eCollection 2011.
The success of research in the field of maternal-infant health, or in any scientific field, relies on the adoption of best practices for data and knowledge management. Prior work by our group and others has identified evidence-based solutions to many of the data management challenges that exist, including cost-effective practices for ensuring high-quality data entry and proper construction and maintenance of data standards and ontologies. Quality assurance practices for data entry and processing are necessary to ensure that data are not denigrated during processing, but the use of these practices has not been widely adopted in the fields of psychology and biology. Furthermore, collaborative research is becoming more common. Collaborative research often involves multiple laboratories, different scientific disciplines, numerous data sources, large data sets, and data sets from public and commercial sources. These factors present new challenges for data and knowledge management. Data security and privacy concerns are increased as data may be accessed by investigators affiliated with different institutions. Collaborative groups must address the challenges associated with federating data access between the data-collecting sites and a centralized data management site. The merging of ontologies between different data sets can become formidable, especially in fields with evolving ontologies. The increased use of automated data acquisition can yield more data, but it can also increase the risk of introducing error or systematic biases into data. In addition, the integration of data collected from different assay types often requires the development of new tools to analyze the data. All of these challenges act to increase the costs and time spent on data management for a given project, and they increase the likelihood of decreasing the quality of the data. In this paper, we review these issues and discuss theoretical and practical approaches for addressing these issues.
母婴健康领域或任何科学领域的研究成功都依赖于采用最佳的数据和知识管理实践。我们小组和其他小组的先前工作已经确定了许多存在的数据管理挑战的循证解决方案,包括确保高质量数据录入和适当构建和维护数据标准和本体的具有成本效益的实践。数据录入和处理的质量保证实践对于确保数据在处理过程中不被贬低是必要的,但这些实践在心理学和生物学领域并没有得到广泛采用。此外,合作研究越来越普遍。合作研究通常涉及多个实验室、不同的科学学科、众多数据源、大型数据集以及来自公共和商业来源的数据。这些因素为数据和知识管理带来了新的挑战。随着数据可能被来自不同机构的调查人员访问,数据安全性和隐私问题会增加。合作小组必须解决在数据收集站点和集中数据管理站点之间联合数据访问相关的挑战。不同数据集之间的本体融合可能变得很复杂,特别是在本体不断发展的领域。自动化数据采集的广泛应用可以产生更多的数据,但也可能增加将错误或系统偏差引入数据的风险。此外,不同检测类型收集的数据的集成通常需要开发新的工具来分析数据。所有这些挑战都增加了给定项目的数据管理成本和时间,并且增加了降低数据质量的可能性。在本文中,我们回顾了这些问题,并讨论了针对这些问题的理论和实践方法。