Yang Cheng, Zhang Ao, Gao Zhan-Qi, Su Guan-Yong
Jiangsu Province Key Laboratory of Chemical Pollution Control and Resources Reuse,School of Environmental and Biological Engineering,Nanjing University of Science and Technology,Nanjing 210094,China.
Key Laboratory of Environment Monitoring and Analysis for Organic Pollutants in Surface Water,Ministry of Ecology and Environment,Jiangsu Province Environmental Monitoring Center,Nanjing 210019,China.
Se Pu. 2025 Jun;43(6):585-593. doi: 10.3724/SP.J.1123.2025.01019.
Biological and environmental samples are complex and contain a highly diverse range of compounds. Analyzing these samples by chromatography-high-resolution mass spectrometry generates a substantial volume of mass-spectrometry data that are composed of mass-to-charge-ratio (), retention-time (RT), and peak-intensity information that require considerable time and energy to process. Consequently, employing software to process mass-spectrometry data for identification and analysis purposes is imperative. Among the many mass-spectrometry data-processing options, XCMS (various forms (X) of chromatography mass spectrometry), which is highly efficient, precise, and freely accessible software for processing mass-spectrometry data, is broadly used in the environmental science field. This study aimed to explore the use of XCMS in environmental science applications by comprehensively reviewing the workflow, underlying principles, and parameter-optimization measures of XCMS. The workflow mainly includes importing, processing, and exporting data. Importing data requires the use of format conversion tools, such as MSConvert, which converts data generated by various instruments into a format acceptable by XCMS, while data processing includes peak detection, alignment, and filling. The various XCMS functions are mainly realized via its built-in algorithms, with the Matched Filter, CentWave, Obiwarp, and Peak Density algorithms most commonly used. The first two algorithms implement the peak-detection function, while the latter two implement the peak-alignment function. XCMS identifies compound peaks from mass-spectrometry data during peak-detection; it first filters for noise and corrects the baseline. An algorithm then detects peaks based on their shapes and intensities. XCMS can also de-emphasize and de-distort to filter out interfering information in each peak signal. The CentWave algorithm is particularly effective for processing high-resolution mass-spectrometry data by improving detection accuracy and recall. Peak-detection is followed by alignment. Here, XCMS uses kernel density estimations to match peaks between samples by estimating the retention-time distribution of matched peaks, which corrects for any nonlinear deviations in retention-times. This step is critical for accurately comparing samples. The peak-filling step resolves missing peaks in the data, and XCMS uses information from other samples to fill these gaps. This process enhances the integrity of the dataset and improves analysis accuracy. In terms of applications, XCMS has demonstrated significant progress for the non-targeted screening of environmental pollutants, identifying exogenous metabolic pollutant transformations, and exploring the endogenous metabolisms of biomolecules. For example, XCMS efficiently extracts the mass spectrometry of complex samples during the non-targeted screening of environmental pollutants, thereby providing a reliable database for subsequent identification. Although the use of XCMS in the environmental science field has delivered particular results, some limitations still exist, including the use of large amounts of memory, problems associated with the software crashing when dealing with large-scale data, and the misclassification of noise as valid signals during feature detection, which results in a large number of false positives, errors, and missed detections when processing data for compounds with complex chemical compositions and structural types. In addition, the degree of user interaction and automation requires further improvement. XCMS offers significant developmental potential in the environmental science field. Continuing algorithmic optimization and database expansion through improvements in algorithmic robustness, data compatibility, and user experience, are expected to see XCMS develop broadly and provide more powerful support for the environmental science field in the future.
生物和环境样品复杂,包含种类繁多的化合物。通过色谱 - 高分辨率质谱分析这些样品会产生大量的质谱数据,这些数据由质荷比()、保留时间(RT)和峰强度信息组成,处理起来需要大量时间和精力。因此,使用软件处理质谱数据以进行鉴定和分析势在必行。在众多质谱数据处理选项中,XCMS(色谱质谱的各种形式(X))是一种高效、精确且免费的质谱数据处理软件,在环境科学领域广泛使用。本研究旨在通过全面回顾XCMS的工作流程、基本原理和参数优化措施,探索其在环境科学应用中的用途。工作流程主要包括数据导入、处理和导出。导入数据需要使用格式转换工具,如MSConvert,它将各种仪器生成的数据转换为XCMS可接受的格式,而数据处理包括峰检测、峰对齐和峰填充。XCMS的各种功能主要通过其内置算法实现,最常用的是匹配滤波器、CentWave、Obiwarp和峰密度算法。前两种算法实现峰检测功能,后两种实现峰对齐功能。XCMS在峰检测过程中从质谱数据中识别化合物峰;它首先过滤噪声并校正基线。然后一种算法根据峰的形状和强度检测峰。XCMS还可以去强调和去扭曲以滤除每个峰信号中的干扰信息。CentWave算法通过提高检测精度和召回率,在处理高分辨率质谱数据方面特别有效。峰检测之后是峰对齐。在此,XCMS使用核密度估计通过估计匹配峰的保留时间分布来匹配样品之间的峰,从而校正保留时间中的任何非线性偏差。这一步对于准确比较样品至关重要。峰填充步骤解决数据中缺失的峰,XCMS使用来自其他样品的信息来填补这些空白。这个过程增强了数据集的完整性并提高了分析准确性。在应用方面,XCMS在环境污染物的非靶向筛选、识别外源性代谢污染物转化以及探索生物分子的内源性代谢方面已取得显著进展。例如,在环境污染物的非靶向筛选过程中,XCMS有效地提取复杂样品的质谱,从而为后续鉴定提供可靠的数据库。尽管XCMS在环境科学领域的使用取得了一定成果,但仍存在一些局限性,包括使用大量内存、处理大规模数据时软件崩溃的问题,以及在特征检测中将噪声误分类为有效信号,这导致在处理化学成分和结构类型复杂的化合物数据时产生大量误报、错误和漏检。此外,用户交互程度和自动化程度还需要进一步提高。XCMS在环境科学领域具有巨大的发展潜力。通过提高算法鲁棒性、数据兼容性和用户体验来持续进行算法优化和数据库扩展,有望使XCMS得到广泛发展,并在未来为环境科学领域提供更强大的支持。