The Department of Clinical Sciences, Faculty of Veterinary Medicine and Animal Science, Swedish University of Agricultural Sciences, SE-750 07 Uppsala, Sweden.
Acta Vet Scand. 2011;53 Suppl 1(Suppl 1):S4. doi: 10.1186/1751-0147-53-S1-S4. Epub 2011 Jun 20.
In a world of limited resources, using existing databases in research is a potentially cost-effective way to increase knowledge, given that correct and meaningful results are gained.Nordic examples of the use of secondary small animal and equine databases include studies based on data from tumour registries, breeding registries, young horse quality contest results, competition data, insurance databases, clinic data, prescription data and hunting ability tests. In spite of this extensive use of secondary databases, integration between databases is less common. The aim of this presentation is to briefly review key papers that exemplify different ways of utilizing data from multiple sources, to highlight the benefits and limitations of the approaches, to discuss key issues/challenges that must be addressed when integrating data and to suggest future directions. Data from pedigree databases have been individually merged with competition data and young horse quality contest data, and true integration has also been done with canine insurance data and with equine clinical data. Data have also been merged on postal code level; i.e. insurance data were merged to a digitized map of Sweden and additional meteorological information added. In addition to all the data quality and validity issues inherent in the use of a single database, additional obstacles arise when combining information from several databases. Loss of individuals due to incorrect or mismatched identifying information can be considerable. If there are any possible biases affecting whether or not individuals can be properly linked, misinformation may result in a further reduction in power. Issues of confidentiality may be more difficult to address across multiple databases. For example, human identity information must be protected, but may be required to ensure valid merging of data. There is a great potential to better address complex issues of health and disease in companion animals and horses by integrating information across existing databases. The challenges outlined in this article should not preclude the ongoing pursuit of this approach.
在资源有限的情况下,利用现有的数据库进行研究是一种增加知识的潜在经济有效的方法,只要能够获得正确且有意义的结果。北欧国家利用小型动物和马的二手数据库的例子包括基于肿瘤登记、繁殖登记、幼马质量竞赛结果、比赛数据、保险数据库、诊所数据、处方数据和狩猎能力测试的数据的研究。尽管如此广泛地使用二手数据库,但数据库之间的整合却比较少见。本演讲的目的是简要回顾一些关键文献,这些文献例证了从多个来源利用数据的不同方法,突出这些方法的优点和局限性,讨论整合数据时必须解决的关键问题/挑战,并提出未来的方向。系谱数据库中的数据已经与比赛数据和幼马质量竞赛数据分别进行了合并,并且还对犬类保险数据和马的临床数据进行了真正的整合。还在邮政编码级别上合并了数据;即,将保险数据合并到瑞典的数字地图上,并添加了额外的气象信息。除了使用单个数据库固有的所有数据质量和有效性问题之外,当组合来自多个数据库的信息时,还会出现额外的障碍。由于识别信息不正确或不匹配而导致的个体丢失可能相当大。如果存在任何可能影响个体是否能够正确链接的偏见,则错误信息可能会进一步降低效力。多个数据库之间的机密性问题可能更难解决。例如,必须保护人类身份信息,但可能需要确保数据的有效合并。通过整合现有数据库中的信息,有很大的潜力可以更好地解决伴侣动物和马匹的复杂健康和疾病问题。本文中概述的挑战不应阻止人们继续追求这种方法。