Harry Butler Institute, Murdoch University, Murdoch, WA, Australia.
School of Mathematical Sciences, Queensland University of Technology, Brisbane, QLD, Australia.
PLoS One. 2022 Aug 9;17(8):e0272413. doi: 10.1371/journal.pone.0272413. eCollection 2022.
Appropriate inspection protocols and mitigation strategies are a critical component of effective biosecurity measures, enabling implementation of sound management decisions. Statistical models to analyze biosecurity surveillance data are integral to this decision-making process. Our research focuses on analyzing border interception biosecurity data collected from a Class A Nature Reserve, Barrow Island, in Western Australia and the associated covariates describing both spatial and temporal interception patterns. A clustering analysis approach was adopted using a generalization of the popular k-means algorithm appropriate for mixed-type data. The analysis approach compared the efficiency of clustering using only the numerical data, then subsequently including covariates to the clustering. Based on numerical data only, three clusters gave an acceptable fit and provided information about the underlying data characteristics. Incorporation of covariates into the model suggested four distinct clusters dominated by physical location and type of detection. Clustering increases interpretability of complex models and is useful in data mining to highlight patterns to describe underlying processes in biosecurity and other research areas. Availability of more relevant data would greatly improve the model. Based on outcomes from our research we recommend broader use of cluster models in biosecurity data, with testing of these models on more datasets to validate the model choice and identify important explanatory variables.
适当的检查协议和缓解策略是有效生物安保措施的关键组成部分,可实现合理的管理决策。用于分析生物安保监测数据的统计模型是这一决策过程的组成部分。我们的研究重点是分析从西澳大利亚巴罗岛 A 级自然保护区收集的边境截获生物安保数据,以及描述空间和时间截获模式的相关协变量。采用了一种聚类分析方法,使用了一种适用于混合数据的流行 k-均值算法的广义方法。该分析方法比较了仅使用数值数据进行聚类的效率,然后将协变量纳入聚类。仅基于数值数据,三个聚类给出了可接受的拟合度,并提供了有关基础数据特征的信息。将协变量纳入模型表明,四个不同的聚类主要由物理位置和检测类型决定。聚类提高了复杂模型的可解释性,在数据挖掘中很有用,可以突出模式,以描述生物安保和其他研究领域中的潜在过程。更多相关数据的可用性将大大改进模型。根据我们研究的结果,我们建议在生物安保数据中更广泛地使用聚类模型,并在更多数据集上测试这些模型,以验证模型选择并确定重要的解释变量。