Gene Expression and Regulation Laboratory (GEaRLab), Department of Biochemistry and Molecular Biology, Faculty of Biological Sciences, University of Concepción, Concepción, Chile.
Department of Anthropology and Sociology, Faculty of Social Sciences, University of Concepción, Concepción, Chile.
Biol Res. 2024 Oct 24;57(1):75. doi: 10.1186/s40659-024-00552-8.
Determining the postmortem interval (PMI) accurately remains a significant challenge in forensic sciences, especially for intervals greater than 5 years (late PMI). Traditional methods often fail due to the extensive degradation of soft tissues, necessitating reliance on bone material examinations. The precision in estimating PMIs diminishes with time, particularly for intervals between 1 and 5 years, dropping to about 50% accuracy. This study aims to address this issue by identifying key protein biomarkers through proteomics and machine learning, ultimately enhancing the accuracy of PMI estimation for intervals exceeding 15 years.
Proteomic analysis was conducted using LC-MS/MS on skeletal remains, specifically focusing on the tibia and ribs. Protein identification was performed using two strategies: a tryptic-specific search and a semitryptic search, the latter being particularly beneficial in cases of natural protein degradation. The Random Forest algorithm was used to model protein abundance data, enabling the prediction of PMI. A thorough screening process, combining importance scores and SHAP values, was employed to identify the most informative proteins for model's training and accuracy.
A minimal set of three biomarkers-K1C13, PGS1, and CO3A1-was identified, significantly improving the prediction accuracy between PMIs of 15 and 20 years. The model, based on protein abundance data from semitryptic peptides in tibia samples, achieved sustained 100% accuracy across 100 iterations. In contrast, non-supervised methods like PCA and MCA did not yield comparable results. Additionally, the use of semitryptic peptides outperformed tryptic peptides, particularly in tibia proteomes, suggesting their potential reliability in late PMI prediction.
Despite limitations such as sample size and PMI range, this study demonstrates the feasibility of combining proteomics and machine learning for accurate late PMI predictions. Future research should focus on broader PMI ranges and various bone types to further refine and standardize forensic proteomic methodologies for PMI estimation.
准确判断死后间隔时间(PMI)仍然是法医学面临的重大挑战,尤其是对于超过 5 年(晚期 PMI)的时间间隔。传统方法由于软组织广泛降解而常常失效,因此需要依赖骨骼材料检查。随着时间的推移,PMI 估计的精度会降低,特别是在 1 至 5 年的时间间隔内,准确性约为 50%。本研究旨在通过蛋白质组学和机器学习来识别关键蛋白质生物标志物,最终提高对超过 15 年的 PMI 估计的准确性。
使用 LC-MS/MS 对骨骼遗骸进行蛋白质组学分析,特别是针对胫骨和肋骨。使用两种策略进行蛋白质鉴定:胰蛋白酶特异性搜索和半胰蛋白酶搜索,后者在天然蛋白质降解的情况下特别有益。使用随机森林算法对蛋白质丰度数据进行建模,从而能够预测 PMI。通过综合重要性得分和 SHAP 值的彻底筛选过程,识别出对模型训练和准确性最有信息的蛋白质。
确定了一组最小的三个生物标志物-K1C13、PGS1 和 CO3A1,它们显著提高了 15 至 20 年 PMI 之间的预测准确性。该模型基于胫骨样本半胰蛋白酶肽的蛋白质丰度数据,在 100 次迭代中始终保持 100%的准确性。相比之下,非监督方法如 PCA 和 MCA 没有产生可比的结果。此外,半胰蛋白酶肽的使用优于胰蛋白酶肽,特别是在胫骨蛋白质组中,这表明它们在晚期 PMI 预测中的潜在可靠性。
尽管存在样本量和 PMI 范围等限制,但本研究表明,将蛋白质组学和机器学习相结合进行准确的晚期 PMI 预测是可行的。未来的研究应侧重于更广泛的 PMI 范围和各种骨骼类型,以进一步完善和标准化法医蛋白质组学 PMI 估计方法。