Loos Martin, Singer Heinz
Swiss Federal Institute for Aquatic Science and Technology (Eawag), 8600 Dübendorf, Switzerland.
Institute of Biogeochemistry and Pollutant Dynamics, ETH Zürich, Zurich, 8092 Switzerland.
J Cheminform. 2017 Feb 23;9:12. doi: 10.1186/s13321-017-0197-z. eCollection 2017.
A large proportion of polar anthropogenic compounds routinely released into the environment comprises homologue series, i.e., sets of chemicals differing in a repeating chemical unit. Using analytical techniques such as liquid chromatography coupled to high-resolution mass spectrometry (LC-HRMS), these compounds are readily measurable as signal sets with characteristic differences in mass and typically retention time. However, and despite such distinct characteristics, no computational approach for the direct, simultaneous and untargeted detection of all such signal sets comprising both LC and HRMS information has to date been presented.
A fast two-staged approach has been developed to extract LC-HRMS signal patterns which can be indicative of homologous analytes. In a first stage, a -d tree representation of picked LC-HRMS peaks is used to extract all feasible 3-tuples of peaks with restrictions in, e.g., mass defect differences. A second stage then recombines these 3-tuples to larger series tuples while ensuring smooth changes in their retention time characteristics. This unsupervised approach was evaluated for ten effluent samples from Swiss sewage treatment plants (STPs), in both positive and negative electrospray-ionization.
Beside recovering all continuous series of previously identified homologues, substantial fractions of nontargeted peaks could subsequently be assigned into very diverse peak series, although assignments were often not unique. The latter ambiguities were resolved by a self-organizing map technique and revealed both distinctive series meshing and rivaling combinatorial solutions in the presence of isobaric or gapped series peaks. When comparing STPs, several ubiquitous yet partially low-frequent series mass differences emerged and may prioritize future identification efforts. The presented algorithm is freely available as part of the R package and as a user-friendly web-interface at www.envihomolog.eawag.ch.Graphical AbstractSearch for systematic series indicative of homologous compounds is based on a partitioned representation of LC-HRMS signal characteristics. This nontargeted search first extracts series triplets in a nearest-neighbour walk and then recombines them to larger ones. For illustration, the two dimensions involving mass defect characteristics are represented by one only.
通常释放到环境中的大部分极地人为化合物由同系物系列组成,即一组在重复化学单元上不同的化学物质。使用诸如液相色谱与高分辨率质谱联用(LC-HRMS)等分析技术,这些化合物很容易作为具有质量和通常保留时间特征差异的信号集进行测量。然而,尽管具有这些明显特征,但迄今为止,尚未提出一种直接、同时且非靶向检测所有包含LC和HRMS信息的此类信号集的计算方法。
已开发出一种快速的两阶段方法来提取可指示同源分析物的LC-HRMS信号模式。在第一阶段,使用挑选出的LC-HRMS峰的kd树表示来提取所有可行的三元峰组,并对例如质量缺陷差异进行限制。然后,第二阶段将这些三元组重新组合成更大的系列元组,同时确保其保留时间特征的平滑变化。在正电喷雾电离和负电喷雾电离中,对来自瑞士污水处理厂(STP)的十个废水样品评估了这种无监督方法。
除了回收所有先前鉴定的同系物的连续系列外,大量未靶向的峰随后可被分配到非常多样的峰系列中,尽管分配通常并非唯一。通过自组织映射技术解决了后者的模糊性,并揭示了在存在等压或有间隙系列峰的情况下独特的系列啮合和竞争组合解决方案。在比较污水处理厂时,出现了几个普遍存在但部分频率较低的系列质量差异,这可能会优先考虑未来的鉴定工作。所提出的算法作为R包的一部分可免费获得,并在www.envihomolog.eawag.ch上作为用户友好的网络界面提供。图形摘要搜索指示同源化合物的系统系列基于LC-HRMS信号特征的分区表示。这种非靶向搜索首先在最近邻遍历中提取系列三元组,然后将它们重新组合成更大的三元组。为了说明,涉及质量缺陷特征的两个维度仅由一个表示。