ALGORITMI, Campus de Gualtar, University of Minho, Rua da Universidade, 4710-057 Braga, Portugal.
Instituto de Investigação e Inovação em Saúde (i3S), Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal.
Genes (Basel). 2021 Jan 2;12(1):61. doi: 10.3390/genes12010061.
Forensic genetics is a fast-growing field that frequently requires DNA-based taxonomy, namely, when evidence are parts of specimens, often highly processed in food, potions, or ointments. Reference DNA-sequences libraries, such as BOLD or GenBank, are imperative tools for taxonomic assignment, particularly when morphology is inadequate for classification. The auditing and curation of these datasets require reliable mechanisms, preferably with automated data preprocessing. Software tools were developed to grade these datasets considering as primary criterion the number of records, which is not compliant with forensic standards, where the priority is validation from independent sources. Moreover, 4SpecID is an efficient software tool developed to audit and annotate reference libraries, specifically designed for forensic applications. Its intuitive user-friendly interface virtually accesses any database and includes specific data mining functions tuned for the widespread BOLD repositories. The built tool was evaluated in laptop MacBook and a dual-Xeon server with a large BOLD dataset ( 36,115 records), and the best execution time to grade the dataset on the laptop was 0.28 s. Datasets of and families were used to evaluate the quality of the tool and the relevance of independent sources validation.
法医遗传学是一个快速发展的领域,经常需要基于 DNA 的分类学,即在证据是标本的一部分时,通常是经过高度加工的食品、药水或药膏。BOLD 或 GenBank 等参考 DNA 序列库是分类学分配的必要工具,特别是在形态学不足以进行分类时。这些数据集的审核和管理需要可靠的机制,最好具有自动化的数据预处理。开发了软件工具来对这些数据集进行评分,主要标准是记录的数量,这不符合法医标准,法医标准优先考虑来自独立来源的验证。此外,4SpecID 是一种高效的软件工具,用于审核和注释参考库,专门为法医应用而设计。它直观的用户友好界面可以访问任何数据库,并包括针对广泛的 BOLD 存储库进行了优化的数据挖掘功能。在带有大型 BOLD 数据集(36115 条记录)的笔记本 MacBook 和双 Xeon 服务器上对构建的工具进行了评估,在笔记本电脑上对数据集进行评分的最佳执行时间为 0.28 秒。使用 和 家族的数据集来评估工具的质量和独立来源验证的相关性。