基于序列数据的法律和非法罂粟分类法医应用的双层机器学习模型的开发。

Development of a two-layer machine learning model for the forensic application of legal and illegal poppy classification based on sequence data.

机构信息

Department of Biotechnology, Sangmyung University, Seoul 03016, the Republic of Korea.

Institute of Intelligence Informatics Technology, Sangmyung University, Seoul 03016, the Republic of Korea.

出版信息

Forensic Sci Int Genet. 2024 Jul;71:103061. doi: 10.1016/j.fsigen.2024.103061. Epub 2024 May 22.

DOI:10.1016/j.fsigen.2024.103061

PMID:38820740

Abstract

Poppies are beneficial plants with a variety of applications, including medicinal, edible, ornamental, and industrial purposes. Some Papaver species are forensically significant plants because they contain opium, a narcotic substance. Internationally trafficked species of illegal poppies are being identified by DNA barcoding employing multiple markers in response to their forensic value. However, effective markers for precise species identification of legal and illegal poppies are still under discussion, with research on illegal poppies focusing on Papaver somniferum L., and species identification studies of Papaver bracteatum and Papaver setigerum DC. still lacking. As a result, in order to evaluate the performance of genetic markers and classify their DNA sequences in the genus Papaver, this study developed the first machine learning-based two-layer model, in which the first layer classifies legal and illegal poppies from the given sequence and the second layer identifies species of illegal poppies using their sequences. We constructed the dataset and investigated biological features from four markers, internal transcribed spacer 1 (ITS1), internal transcribed spacer 2 (ITS2), transfer RNA Leucine (trnL), transfer RNA Leucine - transfer RNA Phenylalanine intergenic spacer (trnL-trnF intergenic spacer) and their combination, using four machine learning algorithms, K-nearest neighbor (KNN), Naïve Bayes (NB), extreme gradient boost (XGBoost) and Random Forest (RF). According to our findings, for Layer 1 to classify legal and illegal poppies, KNN-based models using combined ITS region achieved the greatest performance of accuracy 0.846 and 0.889 using training and test sets, respectively. Additionally, for Layer 2 to identify illegal poppy species, KNN-based models using combined ITS region achieved the best performance of 0.833 and 1.000 for using training and test sets, respectively. To validate the model, the combined ITS region, which includes ITS 1 and 2 sequences, from blind poppy samples were used as a case study, with the Layer 1 correctly classifying legal and illegal poppies with over 0.830 accuracy. Layer 2 correctly identified P. setigerum DC., however, only one of the three P. somniferum L. species was accurately identified. Nevertheless, our research shows that machine learning can be used to classify and identify legal and illegal poppy species using DNA barcodes which can then be used as an efficient and effective forensic tool for improved law enforcement and a safer society.

摘要

罂粟是一种有益的植物，具有多种用途，包括药用、食用、观赏和工业用途。一些罂粟属物种因其含有鸦片而具有法医学意义，鸦片是一种麻醉物质。国际贩运的非法罂粟物种正在通过使用多个标记物的 DNA 条形码技术进行鉴定，以应对其法医学价值。然而，对于准确鉴定合法和非法罂粟物种的有效标记物仍在讨论中，对非法罂粟的研究主要集中在罂粟属的罂粟上，而对罂粟属植物 bracteatum 和罂粟属植物 setigerum DC. 的物种鉴定研究仍缺乏。因此，为了评估遗传标记物的性能并对罂粟属的 DNA 序列进行分类，本研究开发了第一个基于机器学习的两层模型，其中第一层从给定序列中分类合法和非法罂粟，第二层使用序列识别非法罂粟的物种。我们构建了数据集，并使用四个机器学习算法（K 近邻 (KNN)、朴素贝叶斯 (NB)、极端梯度提升 (XGBoost) 和随机森林 (RF)）从四个标记物（内部转录间隔区 1（ITS1）、内部转录间隔区 2（ITS2）、转移 RNA 亮氨酸（trnL）、转移 RNA 亮氨酸-转移 RNA 苯丙氨酸基因间隔区（trnL-trnF 基因间隔区）及其组合中研究生物特征。根据我们的研究结果，对于第一层，使用基于 KNN 的模型对合法和非法罂粟进行分类，使用组合 ITS 区的模型在训练集和测试集上分别实现了最高的准确性 0.846 和 0.889。此外，对于第二层，使用基于 KNN 的模型对非法罂粟物种进行识别，使用组合 ITS 区的模型在训练集和测试集上分别实现了最佳性能 0.833 和 1.000。为了验证模型，我们使用盲样罂粟的组合 ITS 区（包含 ITS1 和 2 序列）作为案例研究，第一层正确分类合法和非法罂粟的准确率超过 0.830。第二层正确识别了罂粟属植物 setigerum DC.，但仅准确识别了三个罂粟属植物 L.物种中的一个。然而，我们的研究表明，机器学习可用于使用 DNA 条形码对合法和非法罂粟物种进行分类和鉴定，然后可将其用作改进执法和建设更安全社会的有效法医工具。

相似文献

Development of a two-layer machine learning model for the forensic application of legal and illegal poppy classification based on sequence data.基于序列数据的法律和非法罂粟分类法医应用的双层机器学习模型的开发。

Forensic Sci Int Genet. 2024 Jul;71:103061. doi: 10.1016/j.fsigen.2024.103061. Epub 2024 May 22.

Development of diagnostic SNP markers and a novel SNP genotyping assay for distinguishing opium poppies.开发用于鉴别罂粟的诊断 SNP 标记和新型 SNP 基因分型检测方法。

Forensic Sci Int. 2022 Oct;339:111416. doi: 10.1016/j.forsciint.2022.111416. Epub 2022 Aug 4.

Genetic and chemical components analysis of Papaver setigerum naturalized in Korea.韩国归化罂粟的遗传和化学成分分析。

Forensic Sci Int. 2012 Oct 10;222(1-3):387-93. doi: 10.1016/j.forsciint.2012.08.002. Epub 2012 Aug 24.

An assessment of the utility of universal and specific genetic markers for opium poppy identification.用于罂粟鉴定的通用和特异性遗传标记的效用评估。

J Forensic Sci. 2010 Sep;55(5):1202-8. doi: 10.1111/j.1556-4029.2010.01423.x.

A new minisatellite VNTR marker, Pscp1, discovered for the identification of opium poppy.一个新的微卫星 VNTR 标记 Pscp1，被发现用于鉴定罂粟。

Forensic Sci Int Genet. 2021 Nov;55:102581. doi: 10.1016/j.fsigen.2021.102581. Epub 2021 Aug 20.

Evaluation of chloroplast DNA barcoding markers to individualize Papaver somniferum for forensic intelligence purposes.评估叶绿体 DNA 条形码标记物，以对罂粟进行个体识别，用于法医学情报目的。

Int J Legal Med. 2024 Jan;138(1):267-275. doi: 10.1007/s00414-022-02862-6. Epub 2022 Jul 5.

Forensic Application of Genetic and Toxicological Analyses for the Identification and Characterization of the Opium Poppy ( L.).遗传与毒理学分析在罂粟（Papaver somniferum L.）鉴定与特征描述中的法医学应用

Biology (Basel). 2022 Apr 27;11(5):672. doi: 10.3390/biology11050672.

Molecular identification and phylogenetic analysis of Papaver based on ITS2 barcoding.基于ITS2条形码的罂粟分子鉴定及系统发育分析

J Forensic Sci. 2022 Mar;67(2):712-719. doi: 10.1111/1556-4029.14925. Epub 2021 Nov 1.

Exploiting expressed sequence tag databases for the development and characterization of gene-derived simple sequence repeat markers in the opium poppy (Papaver somniferum L.) for forensic applications.利用表达序列标签数据库开发和鉴定用于法医应用的罂粟（Papaver somniferum L.）基因衍生简单序列重复标记。

J Forensic Sci. 2011 Sep;56(5):1131-5. doi: 10.1111/j.1556-4029.2011.01810.x. Epub 2011 May 19.

Evaluation of 19 short tandem repeat markers for individualization of Papaver somniferum.用于罂粟个体化的19个短串联重复序列标记的评估

Sci Justice. 2020 May;60(3):253-262. doi: 10.1016/j.scijus.2019.12.002. Epub 2019 Dec 13.

基于序列数据的法律和非法罂粟分类法医应用的双层机器学习模型的开发。

Development of a two-layer machine learning model for the forensic application of legal and illegal poppy classification based on sequence data.

机构信息

Department of Biotechnology, Sangmyung University, Seoul 03016, the Republic of Korea.

Institute of Intelligence Informatics Technology, Sangmyung University, Seoul 03016, the Republic of Korea.

出版信息

Forensic Sci Int Genet. 2024 Jul;71:103061. doi: 10.1016/j.fsigen.2024.103061. Epub 2024 May 22.

DOI:10.1016/j.fsigen.2024.103061

PMID:38820740

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于序列数据的法律和非法罂粟分类法医应用的双层机器学习模型的开发。

Development of a two-layer machine learning model for the forensic application of legal and illegal poppy classification based on sequence data.

机构信息

出版信息

相似文献

基于序列数据的法律和非法罂粟分类法医应用的双层机器学习模型的开发。

Development of a two-layer machine learning model for the forensic application of legal and illegal poppy classification based on sequence data.

机构信息

出版信息

相似文献