Hu Ximin, Mar Derek, Suzuki Nozomi, Zhang Bowei, Peter Katherine T, Beck David A C, Kolodziej Edward P
Center for Urban Waters, University of Washington Tacoma, Tacoma, WA, 98421, USA.
Department of Civil and Environmental Engineering, University of Washington, Seattle, WA, 98195, USA.
J Cheminform. 2023 Sep 23;15(1):87. doi: 10.1186/s13321-023-00741-9.
Mass-Suite (MSS) is a Python-based, open-source software package designed to analyze high-resolution mass spectrometry (HRMS)-based non-targeted analysis (NTA) data, particularly for water quality assessment and other environmental applications. MSS provides flexible, user-defined workflows for HRMS data processing and analysis, including both basic functions (e.g., feature extraction, data reduction, feature annotation, data visualization, and statistical analyses) and advanced exploratory data mining and predictive modeling capabilities that are not provided by currently available open-source software (e.g., unsupervised clustering analyses, a machine learning-based source tracking and apportionment tool). As a key advance, most core MSS functions are supported by machine learning algorithms (e.g., clustering algorithms and predictive modeling algorithms) to facilitate function accuracy and/or efficiency. MSS reliability was validated with mixed chemical standards of known composition, with 99.5% feature extraction accuracy and ~ 52% overlap of extracted features relative to other open-source software tools. Example user cases of laboratory data evaluation are provided to illustrate MSS functionalities and demonstrate reliability. MSS expands available HRMS data analysis workflows for water quality evaluation and environmental forensics, and is readily integrated with existing capabilities. As an open-source package, we anticipate further development of improved data analysis capabilities in collaboration with interested users.
Mass-Suite(MSS)是一个基于Python的开源软件包,旨在分析基于高分辨率质谱(HRMS)的非靶向分析(NTA)数据,特别是用于水质评估和其他环境应用。MSS为HRMS数据处理和分析提供了灵活的、用户定义的工作流程,包括基本功能(如特征提取、数据简化、特征注释、数据可视化和统计分析)以及当前可用开源软件所没有的高级探索性数据挖掘和预测建模功能(如无监督聚类分析、基于机器学习的源追踪和分配工具)。作为一项关键进展,大多数核心MSS功能由机器学习算法(如聚类算法和预测建模算法)支持,以提高功能的准确性和/或效率。通过已知成分的混合化学标准验证了MSS的可靠性,其特征提取准确率为99.5%,相对于其他开源软件工具,提取特征的重叠率约为52%。提供了实验室数据评估的示例用户案例,以说明MSS的功能并证明其可靠性。MSS扩展了用于水质评估和环境法医鉴定的可用HRMS数据分析工作流程,并且很容易与现有功能集成。作为一个开源软件包,我们期待与感兴趣的用户合作进一步开发改进的数据分析功能。