Knecht Siam, Santos Fréderic, Ardagna Yann, Alunni Véronique, Adalian Pascal, Nogueira Luísa
Aix Marseille Univ, CNRS, EFS, ADES, 13007, Marseille, France.
Institut Universitaire d'Anthropologie Médico-Légale, Faculté de Médecine, Université Côte d'Azur, 28 Avenue de Valombrose, 06107, Cedex 2, Nice, France.
Int J Legal Med. 2023 Nov;137(6):1887-1895. doi: 10.1007/s00414-023-03072-4. Epub 2023 Aug 1.
Sex estimation from skeletal remains is one of the crucial issues in forensic anthropology. Long bones can be a valid alternative to skeletal remains for sex estimation when more dimorphic bones are absent or degraded, preventing any estimation from the first intention methods. The purpose of this study was to generate and compare classification models for sex estimation based on combined measurement of long bones using machine learning classifiers. Eighteen measurements from four long bones (radius, humerus, femur, and tibia) were taken from a total of 2141 individuals. Five machine learning methods were employed to predict the sex: a linear discriminant analysis (LDA), penalized logistic regression (PLR), random forest (RF), support vector machine (SVM), and artificial neural network (ANN). The different classification algorithms using all bones generated highly accuracy models with cross-validation, ranging from 90 to 92% on the validation sample. The classification with isolated bones ranked between 83.3 and 90.3% on the validation sample. In both cases, random forest stands out with the highest accuracy and seems to be the best model for our investigation. This study upholds the value of combined long bones for sex estimation and provides models that can be applied with high accuracy to different populations.
从骨骼遗骸中估计性别是法医人类学中的关键问题之一。当更多具有明显性别差异的骨骼缺失或退化,导致无法采用首选方法进行估计时,长骨可作为骨骼遗骸进行性别估计的有效替代。本研究的目的是使用机器学习分类器,基于长骨的联合测量生成并比较性别估计的分类模型。从总共2141名个体的四块长骨(桡骨、肱骨、股骨和胫骨)上获取了18项测量数据。采用了五种机器学习方法来预测性别:线性判别分析(LDA)、惩罚逻辑回归(PLR)、随机森林(RF)、支持向量机(SVM)和人工神经网络(ANN)。使用所有骨骼的不同分类算法通过交叉验证生成了高精度模型,在验证样本上的准确率从90%到92%不等。使用单个骨骼进行分类时,验证样本的准确率在83.3%至90.3%之间。在这两种情况下,随机森林的准确率最高,似乎是我们研究的最佳模型。本研究证实了联合长骨在性别估计中的价值,并提供了可高精度应用于不同人群的模型。