University of Quebec at Trois-Rivières, Canada; Laboratoire de Recherche en Criminalistique, Trois-Rivières, Canada; Centre International de Criminologie Comparee, Montreal, Canada.
Centre International de Criminologie Comparee, Montreal, Canada.
Forensic Sci Int. 2021 May;322:110753. doi: 10.1016/j.forsciint.2021.110753. Epub 2021 Mar 15.
Fibre population surveys are a necessary part of the forensic fibres examination field. They provide valuable information as to which fibres are the most popular and help estimate the likelihood of observing similar properties in a fibre unrelated to the event. The time needed to carry these types of studies is however a major obstacle to wider use. With the advent of e-commerce and digital computation, collecting information from digital sources and structuring it in a convenient way may provide meaningful information on fibres population. It has become more affordable for researchers who can now devote most of their time to extracting meaningful information from the structured data. In this article, we have used a scrapy and kibana/elastic search interface to crawl and scrape a major online clothes retailer. In less than 24 h we have extracted 68 text-based field describing a total of 24,701 clothes to help provide precise estimations of fibres types and color frequencies. We were able to provide data that cotton, polyester, viscose and elastane are the 4 main types of fibres used in the textile industry. Elastane, while being very popular in garments, rarely accounts for more than 10% of the mass while cotton accounts for up to 80% of content. The most common colors are white, black, and blue, with important dependencies to the fibre type. Through further statistics and examples we demonstrate that web scraping techniques have the potential to provide near real-time population studies that can greatly benefit forensic practitioners.
纤维群体调查是法医纤维检验领域的必要组成部分。它们提供了有关哪些纤维最受欢迎的有价值信息,并有助于估计在与事件无关的纤维中观察到类似特性的可能性。然而,进行此类研究所需的时间是广泛应用的主要障碍。随着电子商务和数字计算的出现,从数字来源收集信息并以方便的方式构建它可能会提供有关纤维群体的有意义信息。研究人员现在可以将大部分时间用于从结构化数据中提取有意义的信息,因此这变得更加负担得起。在本文中,我们使用了 scrapy 和 kibana/elastic search 接口来抓取和刮取主要的在线服装零售商。在不到 24 小时的时间内,我们提取了 68 个基于文本的字段,描述了总共 24701 件衣服,以帮助提供纤维类型和颜色频率的精确估计。我们能够提供的数据表明,棉花、聚酯、粘胶纤维和氨纶是纺织工业中使用的 4 种主要纤维类型。氨纶虽然在服装中非常流行,但很少占质量的 10%以上,而棉花的含量高达 80%。最常见的颜色是白色、黑色和蓝色,与纤维类型有重要的依存关系。通过进一步的统计和示例,我们证明了网络抓取技术有可能提供近乎实时的群体研究,这将极大地有益于法医从业人员。