Aoyagi Satoka, Fujita Miya, Itoh Hidemi, Itoh Hiroto, Nagatomi Takaharu, Okamoto Masayuki, Ueno Tomikazu
Faculty of Science and Technology, Seikei University, Musashino, Tokyo 180-8633, Japan.
JSR Corporation, 100 Kawajiri-Cho, Yokkaichi, Mie 510-8552, Japan.
J Am Soc Mass Spectrom. 2024 Dec 4;35(12):3057-3062. doi: 10.1021/jasms.4c00310. Epub 2024 Oct 12.
Time-of-flight secondary ion mass spectrometry (ToF-SIMS) data interpretation for organic materials is complicated because of various fragment ions produced from each molecule and the overlapping of certain mass peaks from different molecules. Fragmentation mechanisms in SIMS are complex because different sputtering and ionization processes can simultaneously occur. Therefore, a prediction system that can identify materials in a sample is required. A novel prediction system for peptides based on ToF-SIMS and amino-acid-based teaching information (labels) for supervised machine learning was developed. To develop the prediction system for general organic materials, the annotation of materials is crucial to creating effective labels for supervised learning. Peptides are composed of 20 amino acid residues, which can be used as labels. We previously developed a peptide prediction system using Random Forest, a supervised machine-learning method. However, only the amino acids contained in the target peptide were predicted, and the amino acid sequence was unable to be assumed. In this study, the amino acid sequence of the test peptide was determined by adding the information on two adjacent amino acids to the labels. Once the prediction system learned the target peptide spectra, the peptides in the newly obtained ToF-SIMS spectra could be identified. The new prediction system also provides useful information for the identification of unknown peptides. The prediction results indicate that two adjacent permutations of amino acids are effective pieces of teaching information for expressing the amino acid sequence of a peptide.
由于每个分子会产生各种碎片离子以及不同分子的某些质量峰存在重叠,飞行时间二次离子质谱(ToF-SIMS)对有机材料的数据解释变得复杂。SIMS中的碎片化机制很复杂,因为不同的溅射和电离过程可能同时发生。因此,需要一个能够识别样品中材料的预测系统。基于ToF-SIMS和用于监督机器学习的基于氨基酸的教学信息(标签),开发了一种用于肽的新型预测系统。为了开发针对一般有机材料的预测系统,材料的注释对于为监督学习创建有效的标签至关重要。肽由20个氨基酸残基组成,可将其用作标签。我们之前使用监督机器学习方法随机森林开发了一种肽预测系统。然而,仅预测了目标肽中包含的氨基酸,无法推测氨基酸序列。在本研究中,通过在标签中添加两个相邻氨基酸的信息来确定测试肽的氨基酸序列。一旦预测系统学习了目标肽谱,就可以识别新获得的ToF-SIMS谱中的肽。新的预测系统还为鉴定未知肽提供了有用信息。预测结果表明,氨基酸的两个相邻排列是表达肽氨基酸序列的有效教学信息。