文本分类可简化在线野生动植物贸易分析。

Text classification to streamline online wildlife trade analyses.

机构信息

Invasion Science & Wildlife Ecology Lab, University of Adelaide, Adelaide, SA, Australia.

School of Mathematical Sciences, University of Adelaide, Adelaide, SA, Australia.

出版信息

PLoS One. 2021 Jul 9;16(7):e0254007. doi: 10.1371/journal.pone.0254007. eCollection 2021.

DOI:10.1371/journal.pone.0254007

PMID:34242279

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8270201/

Abstract

Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practitioners. Given that many wildlife-trade advertisements have an unstructured text format, automated identification of relevant listings has not traditionally been possible, nor attempted. Other scientific disciplines have solved similar problems using machine learning and natural language processing models, such as text classifiers. Here, we test the ability of a suite of text classifiers to extract relevant advertisements from wildlife trade occurring on the Internet. We collected data from an Australian classifieds website where people can post advertisements of their pet birds (n = 16.5k advertisements). We found that text classifiers can predict, with a high degree of accuracy, which listings are relevant (ROC AUC ≥ 0.98, F1 score ≥ 0.77). Furthermore, in an attempt to answer the question 'how much data is required to have an adequately performing model?', we conducted a sensitivity analysis by simulating decreases in sample sizes to measure the subsequent change in model performance. From our sensitivity analysis, we found that text classifiers required a minimum sample size of 33% (c. 5.5k listings) to accurately identify relevant listings (for our dataset), providing a reference point for future applications of this sort. Our results suggest that text classification is a viable tool that can be applied to the online trade of wildlife to reduce time dedicated to data cleaning. However, the success of text classifiers will vary depending on the advertisements and websites, and will therefore be context dependent. Further work to integrate other machine learning tools, such as image classification, may provide better predictive abilities in the context of streamlining data processing for wildlife trade related online data.

摘要

自动监测进行野生动物交易的网站对于通知保护和生物安全工作越来越有必要。然而，电子商务和野生动物交易网站可能包含大量广告，其中未知比例可能与研究人员和从业者无关。鉴于许多野生动物贸易广告采用无结构的文本格式，传统上不可能也没有尝试过使用自动化方法识别相关列表。其他科学学科已经使用机器学习和自然语言处理模型（如文本分类器）解决了类似的问题。在这里，我们测试了一系列文本分类器从互联网上的野生动物交易中提取相关广告的能力。我们从澳大利亚的一个分类广告网站收集数据，人们可以在该网站上发布他们的宠物鸟的广告（n = 16500 条广告）。我们发现，文本分类器可以高度准确地预测哪些列表是相关的（ROC AUC ≥ 0.98，F1 分数 ≥ 0.77）。此外，为了回答“需要多少数据才能使模型表现良好？”这个问题，我们通过模拟样本量的减少进行了敏感性分析，以衡量模型性能的后续变化。从我们的敏感性分析中，我们发现文本分类器需要至少 33%（约 5500 个列表）的最小样本量才能准确识别相关列表（对于我们的数据集），为将来的此类应用提供了参考点。我们的结果表明，文本分类是一种可行的工具，可以应用于野生动物的在线交易，以减少用于数据清理的时间。然而，文本分类器的成功将取决于广告和网站，因此将取决于上下文。进一步整合其他机器学习工具（如图像分类）的工作可能会在简化与野生动物贸易相关的在线数据处理方面提供更好的预测能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2469/8270201/0f50b6c7b1c4/pone.0254007.g001.jpg

相似文献

Text classification to streamline online wildlife trade analyses.

PLoS One. 2021 Jul 9;16(7):e0254007. doi: 10.1371/journal.pone.0254007. eCollection 2021.

A guide to using the internet to monitor and quantify the wildlife trade.

Conserv Biol. 2021 Aug;35(4):1130-1139. doi: 10.1111/cobi.13675. Epub 2021 Mar 8.

Gaps in global wildlife trade monitoring leave amphibians vulnerable.

Elife. 2021 Aug 12;10:e70086. doi: 10.7554/eLife.70086.

Identifying opportunities for expert-mediated triangulation in monitoring wildlife trade on social media.

Conserv Biol. 2022 Apr;36(2):e13858. doi: 10.1111/cobi.13858. Epub 2022 Jan 17.

Early warning of trends in commercial wildlife trade through novel machine-learning analysis of patent filing.

Nat Commun. 2024 Aug 1;15(1):6379. doi: 10.1038/s41467-024-49688-x.

Estimating the extent and structure of trade in horticultural orchids via social media.

Conserv Biol. 2016 Oct;30(5):1038-47. doi: 10.1111/cobi.12721. Epub 2016 May 17.

Digital surveillance: a novel approach to monitoring the illegal wildlife trade.

PLoS One. 2012;7(12):e51156. doi: 10.1371/journal.pone.0051156. Epub 2012 Dec 7.

Assessing the extent and nature of wildlife trade on the dark web.

Conserv Biol. 2016 Aug;30(4):900-4. doi: 10.1111/cobi.12707. Epub 2016 Apr 28.

A comparison of rule-based and machine learning approaches for classifying patient portal messages.

Int J Med Inform. 2017 Sep;105:110-120. doi: 10.1016/j.ijmedinf.2017.06.004. Epub 2017 Jun 23.

Novel detection of provenance in the illegal wildlife trade using elemental data.

Sci Rep. 2018 Oct 18;8(1):15380. doi: 10.1038/s41598-018-33786-0.

引用本文的文献

The changing landscape of text mining: a review of approaches for ecology and evolution.

Proc Biol Sci. 2024 Jul;291(2027):20240423. doi: 10.1098/rspb.2024.0423. Epub 2024 Jul 31.

Quantifying global colonization pressures of alien vertebrates from wildlife trade.

Nat Commun. 2023 Nov 30;14(1):7914. doi: 10.1038/s41467-023-43754-6.

本文引用的文献

Use of Machine Learning to Detect Wildlife Product Promotion and Sales on Twitter.

Front Big Data. 2019 Aug 27;2:28. doi: 10.3389/fdata.2019.00028. eCollection 2019.

A guide to using the internet to monitor and quantify the wildlife trade.

Conserv Biol. 2021 Aug;35(4):1130-1139. doi: 10.1111/cobi.13675. Epub 2021 Mar 8.

Online auction marketplaces as a global pathway for aquatic invasive species.

Hydrobiologia. 2021;848(9):1967-1979. doi: 10.1007/s10750-020-04407-7. Epub 2020 Sep 17.

iEcology: Harnessing Large Online Resources to Generate Ecological Insights.

Trends Ecol Evol. 2020 Jul;35(7):630-639. doi: 10.1016/j.tree.2020.03.003. Epub 2020 Apr 10.

Wildlife trade shifts from brick-and-mortar markets to virtual marketplaces: A case study of birds of prey trade in Thailand.

J Asia Pac Biodivers. 2020 Sep 1;13(3):454-461. doi: 10.1016/j.japb.2020.03.012. Epub 2020 Mar 25.

Global wildlife trade across the tree of life.

Science. 2019 Oct 4;366(6461):71-76. doi: 10.1126/science.aav5327.

Deep learning for environmental conservation.

Curr Biol. 2019 Oct 7;29(19):R977-R982. doi: 10.1016/j.cub.2019.08.016.

Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning.

Proc Natl Acad Sci U S A. 2018 Jun 19;115(25):E5716-E5725. doi: 10.1073/pnas.1719367115. Epub 2018 Jun 5.

A framework for investigating illegal wildlife trade on social media with machine learning.

Conserv Biol. 2019 Feb;33(1):210-213. doi: 10.1111/cobi.13104. Epub 2018 Nov 14.

Leaky doors: Private captivity as a prominent source of bird introductions in Australia.

PLoS One. 2017 Feb 24;12(2):e0172851. doi: 10.1371/journal.pone.0172851. eCollection 2017.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

文本分类可简化在线野生动植物贸易分析。

Text classification to streamline online wildlife trade analyses.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献