Hulsegge B, de Greef K H
Animal Breeding and Genomics, Wageningen Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands.
Animal Breeding and Genomics, Wageningen Livestock Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands.
Prev Vet Med. 2018 May 1;153:64-70. doi: 10.1016/j.prevetmed.2018.03.003. Epub 2018 Mar 6.
A large amount of data is collected routinely in meat inspection in pig slaughterhouses. A time series clustering approach is presented and applied that groups farms based on similar statistical characteristics of meat inspection data over time. A three step characteristic-based clustering approach was used from the idea that the data contain more info than the incidence figures. A stratified subset containing 511,645 pigs was derived as a study set from 3.5 years of meat inspection data. The monthly averages of incidence of pleuritis and of pneumonia of 44 Dutch farms (delivering 5149 batches to 2 pig slaughterhouses) were subjected to 1) derivation of farm level data characteristics 2) factor analysis and 3) clustering into groups of farms. The characteristic-based clustering was able to cluster farms for both lung aberrations. Three groups of data characteristics were informative, describing incidence, time pattern and degree of autocorrelation. The consistency of clustering similar farms was confirmed by repetition of the analysis in a larger dataset. The robustness of the clustering was tested on a substantially extended dataset. This confirmed the earlier results, three data distribution aspects make up the majority of distinction between groups of farms and in these groups (clusters) the majority of the farms was allocated comparable to the earlier allocation (75% and 62% for pleuritis and pneumonia, respectively). The difference between pleuritis and pneumonia in their seasonal dependency was confirmed, supporting the biological relevance of the clustering. Comparison of the identified clusters of statistically comparable farms can be used to detect farm level risk factors causing the health aberrations beyond comparison on disease incidence and trend alone.
在生猪屠宰场的肉类检验中,通常会收集大量数据。本文提出并应用了一种时间序列聚类方法,该方法基于肉类检验数据随时间的相似统计特征对农场进行分组。基于数据包含的信息比发病率数据更多这一理念,采用了一种三步特征聚类方法。从3.5年的肉类检验数据中提取了一个包含511,645头猪的分层子集作为研究集。对44个荷兰农场(向2个生猪屠宰场交付5149批生猪)的胸膜炎和肺炎发病率月平均值进行了以下操作:1)推导农场层面的数据特征;2)进行因子分析;3)将农场聚类成不同组。基于特征的聚类能够对两种肺部异常情况的农场进行聚类。三组数据特征具有信息价值,分别描述了发病率、时间模式和自相关程度。通过在更大的数据集中重复分析,证实了对相似农场进行聚类的一致性。在一个大幅扩展的数据集中测试了聚类的稳健性。这证实了早期的结果,即三个数据分布方面构成了农场组之间差异的大部分,并且在这些组(聚类)中,大多数农场的分配与早期分配相当(胸膜炎和肺炎分别为75%和62%)。证实了胸膜炎和肺炎在季节依赖性方面的差异,支持了聚类的生物学相关性。对经统计可比的农场识别出的聚类进行比较,可用于检测导致健康异常的农场层面风险因素,而不仅仅是基于疾病发病率和趋势进行比较。