利用基于自然语言处理的方法来确定文物残留物的成分。

Utilization of a natural language processing-based approach to determine the composition of artifact residues.

机构信息

Department of Mathematics, The University of Chicago, Chicago, IL, 60637, USA.

Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, IL, 60637, USA.

出版信息

BMC Bioinformatics. 2024 Sep 27;25(1):311. doi: 10.1186/s12859-024-05888-2.

DOI:10.1186/s12859-024-05888-2

PMID:39333884

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11437931/

Abstract

BACKGROUND

Determining the composition of artifact residues is a central problem in ancient residue metabolomics. This is done by comparing mass spectral features in common with an experimental artifact and an ancient artifact (standard method). While this method is simple and straightforward, we sought to increase the accuracy of predicting which plant species had been used in which artifacts.

RESULTS

Here, we introduce an algorithm (new method) based on ideas from the field of natural language processing (NLP) to solve this problem. We tested our strategy on a set of modern clay pipes. To limit biases, we were not provided information on which plant species had been smoked in which clay pipes. The results indicate that our new method performed 12.5% better than the standard method in predicting the plant species smoked in each artifact.

CONCLUSIONS

Utilizing an NLP-based approach, we developed a robust algorithm for characterizing the composition of artifact residues. This work also discusses other general applications in which our algorithm could be used in the field of metabolomics, such as datasets where there are a limited number of replicates.

摘要

背景

确定文物残留物的成分是古代残留物代谢组学的一个核心问题。这是通过将与实验文物和古代文物（标准方法）共有的质谱特征进行比较来完成的。虽然这种方法简单直接，但我们试图提高预测哪些植物物种被用于哪些文物的准确性。

结果

在这里，我们引入了一种基于自然语言处理（NLP）领域思想的算法（新方法）来解决这个问题。我们在一组现代粘土烟斗上测试了我们的策略。为了限制偏差，我们没有提供有关在哪个粘土烟斗中吸食了哪些植物物种的信息。结果表明，我们的新方法在预测每个文物中吸食的植物物种方面比标准方法的表现要好 12.5%。

结论

我们利用基于 NLP 的方法为文物残留物的成分特征开发了一种强大的算法。这项工作还讨论了代谢组学领域中我们的算法可以使用的其他一般应用，例如在重复次数有限的数据集。

相似文献

Utilization of a natural language processing-based approach to determine the composition of artifact residues.利用基于自然语言处理的方法来确定文物残留物的成分。

BMC Bioinformatics. 2024 Sep 27;25(1):311. doi: 10.1186/s12859-024-05888-2.

Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships.Spec2Vec：通过学习结构关系提高质谱相似性评分。

PLoS Comput Biol. 2021 Feb 16;17(2):e1008724. doi: 10.1371/journal.pcbi.1008724. eCollection 2021 Feb.

Automated quantification and evaluation of motion artifact on coronary CT angiography images.冠状动脉 CT 血管造影图像中运动伪影的自动量化和评估。

Med Phys. 2018 Dec;45(12):5494-5508. doi: 10.1002/mp.13243. Epub 2018 Nov 13.

An Ancient Residue Metabolomics-Based Method to Distinguish Use of Closely Related Plant Species in Ancient Pipes.一种基于古代残留物代谢组学的方法，用于区分古代烟斗中密切相关植物物种的使用情况。

Front Mol Biosci. 2020 Jun 26;7:133. doi: 10.3389/fmolb.2020.00133. eCollection 2020.

Artifact characterization and removal for in vivo neural recording.用于体内神经记录的伪迹特征描述与去除

J Neurosci Methods. 2014 Apr 15;226:110-123. doi: 10.1016/j.jneumeth.2014.01.027. Epub 2014 Feb 7.

Mass spectrometry as a quantitative tool in plant metabolomics.质谱分析法作为植物代谢组学中的一种定量工具。

Philos Trans A Math Phys Eng Sci. 2016 Oct 28;374(2079). doi: 10.1098/rsta.2015.0370.

Comparison of ring artifact removal methods using flat panel detector based CT images.基于平板探测器 CT 图像的环形伪影去除方法比较。

Biomed Eng Online. 2011 Aug 17;10:72. doi: 10.1186/1475-925X-10-72.

Gaussian diffusion sinogram inpainting for X-ray CT metal artifact reduction.用于减少X射线CT金属伪影的高斯扩散正弦图修复

Biomed Eng Online. 2017 Jan 5;16(1):1. doi: 10.1186/s12938-016-0292-9.

A streak artifact reduction algorithm in sparse-view CT using a self-supervised neural representation.基于自监督神经表示的稀疏视角 CT 条纹伪影减少算法。

Med Phys. 2022 Dec;49(12):7497-7515. doi: 10.1002/mp.15885. Epub 2022 Aug 8.

CLIMAR: classified linear interpolation based metal artifact reduction for severe metal artifact reduction in x-ray CT imaging.CLIMAR：基于分类线性插值的金属伪影减少技术，用于 X 射线 CT 成像中的严重金属伪影减少。

Phys Med Biol. 2021 Apr 1;66(7). doi: 10.1088/1361-6560/abeae6.

本文引用的文献

MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights.MetaboAnalyst 5.0：缩小原始光谱与功能见解之间的差距。

Nucleic Acids Res. 2021 Jul 2;49(W1):W388-W396. doi: 10.1093/nar/gkab382.

Metabolomics-based analysis of miniature flask contents identifies tobacco mixture use among the ancient Maya.基于代谢组学的微型瓶内容分析鉴定了古代玛雅人使用的烟草混合物。

Sci Rep. 2021 Jan 15;11(1):1590. doi: 10.1038/s41598-021-81158-y.

Front Mol Biosci. 2020 Jun 26;7:133. doi: 10.3389/fmolb.2020.00133. eCollection 2020.

ColocML: machine learning quantifies co-localization between mass spectrometry images.ColocML：机器学习定量质谱图像之间的共定位。

Bioinformatics. 2020 May 1;36(10):3215-3224. doi: 10.1093/bioinformatics/btaa085.

Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF.使用词频-逆文档频率从微生物基因组中对基因交换群落进行稳健推断。

Front Microbiol. 2017 Jan 19;8:21. doi: 10.3389/fmicb.2017.00021. eCollection 2017.

PEP search in MyCompoundID: detection and identification of dipeptides and tripeptides using dimethyl labeling and hydrophilic interaction liquid chromatography tandem mass spectrometry.在 MyCompoundID 中进行 PEP 搜索：使用二甲标记和亲水相互作用液相色谱串联质谱法检测和鉴定二肽和三肽。

Anal Chem. 2014 Apr 1;86(7):3568-74. doi: 10.1021/ac500109y. Epub 2014 Mar 17.

XCMS Online: a web-based platform to process untargeted metabolomic data.XCMS Online：一个用于处理非靶向代谢组学数据的网络平台。

Anal Chem. 2012 Jun 5;84(11):5035-9. doi: 10.1021/ac300698c. Epub 2012 May 10.

MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data.MZmine 2：用于处理、可视化和分析基于质谱的分子图谱数据的模块化框架。

BMC Bioinformatics. 2010 Jul 23;11:395. doi: 10.1186/1471-2105-11-395.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验