Suppr超能文献

SpecieScan:MALDI-ToF-MS 中骨胶原蛋白肽的半自动分类鉴定

SpecieScan: semi-automated taxonomic identification of bone collagen peptides from MALDI-ToF-MS.

机构信息

Department of Evolutionary Anthropology, University of Vienna, University Biology Building, A-1030 Vienna, Austria.

Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria.

出版信息

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae054.

Abstract

MOTIVATION

Zooarchaeology by Mass Spectrometry (ZooMS) is a palaeoproteomics method for the taxonomic determination of collagen, which traditionally involves challenging manual spectra analysis with limitations in quantitative results. As the ZooMS reference database expands, a faster and reproducible identification tool is necessary. Here we present SpecieScan, an open-access algorithm for automating taxa identification from raw MALDI-ToF mass spectrometry (MS) data.

RESULTS

SpecieScan was developed using R (pre-processing) and Python (automation). The algorithm's output includes identified peptide markers, closest matching taxonomic group (taxon, family, order), correlation scores with the reference databases, and contaminant peaks present in the spectra. Testing on original MS data from bones discovered at Palaeothic archaeological sites, including Denisova Cave in Russia, as well as using publicly-available, externally produced data, we achieved >90% accuracy at the genus-level and ∼92% accuracy at the family-level for mammalian bone collagen previously analysed manually.

AVAILABILITY AND IMPLEMENTATION

The SpecieScan algorithm, along with the raw data used in testing, results, reference database, and common contaminants lists are freely available on Github (https://github.com/mesve/SpecieScan).

摘要

动机

基于质谱的动物考古学(ZooMS)是一种用于胶原质分类鉴定的古蛋白组学方法,传统上涉及具有定量结果限制的具有挑战性的手动光谱分析。随着 ZooMS 参考数据库的扩展,需要一种更快且可重复的识别工具。在这里,我们介绍了 SpecieScan,这是一种用于从原始 MALDI-TOF 质谱(MS)数据中自动识别分类群的开放获取算法。

结果

SpecieScan 使用 R(预处理)和 Python(自动化)开发。该算法的输出包括鉴定出的肽标记物、与参考数据库最匹配的分类群(分类群、科、目)、与参考数据库的相关得分以及光谱中存在的污染物峰。在俄罗斯丹尼索瓦洞穴等古考古遗址发现的骨骼的原始 MS 数据上进行测试,以及使用公开提供的、外部产生的数据进行测试,我们在以前手动分析的哺乳动物骨骼胶原质方面实现了 >90%的属级准确性和 ∼92%的科级准确性。

可用性和实施

SpecieScan 算法以及用于测试、结果、参考数据库和常见污染物列表的原始数据都可在 Github(https://github.com/mesve/SpecieScan)上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/925d/10918634/0c2fc03f84d7/btae054f8.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验