Floris Matteo, Raitano Giuseppa, Medda Ricardo, Benfenati Emilio
CRS4 - Center for advanced studies, research and development in Sardinia, Loc. Piscina Manna, Building 1, 09010, Pula (CA), Italy.
Department of Biomedical Sciences, University of Sassari, Sassari, Italy.
Mol Inform. 2017 Jul;36(7). doi: 10.1002/minf.201600133. Epub 2016 Dec 29.
The identification of structural alerts is one of the simplest tools used for the identification of potentially toxic chemical compounds. Structural alerts have served as an aid to quickly identify chemicals that should be either prioritized for testing or for elimination from further consideration and use. In the recent years, the availability of larger datasets, often growing in the context of collaborative efforts and competitions, created the raw material needed to identify new and more accurate structural alerts. This work applied a method to efficiently mine large toxicological dataset for structural alert showing a strong statistical association with mutagenicity. In details, we processed a large Ames mutagenicity dataset comprising 14,015 unique molecules obtained by joining different data sources. After correction for multiple testing, we were able to assign a probability value to each fragment. A total of 51 rules were identified, with p-value < 0.05. Using the same method, we also confirmed the statistical significance of several mutagenicity rules already present and largely recognized in the literature. In addition, we have extended the application of our method by predicting the mutagenicity of an external data set.
结构警示的识别是用于识别潜在有毒化合物的最简单工具之一。结构警示有助于快速识别那些应优先进行测试或从进一步考虑和使用中排除的化学物质。近年来,更大数据集的可得性(这些数据集通常在合作努力和竞赛的背景下不断增长)为识别新的、更准确的结构警示创造了所需的原材料。这项工作应用了一种方法来有效地挖掘大型毒理学数据集,以寻找与致突变性有强统计关联的结构警示。具体而言,我们处理了一个大型的埃姆斯致突变性数据集,该数据集包含通过合并不同数据源获得的14015个独特分子。在进行多重检验校正后,我们能够为每个片段赋予一个概率值。共识别出51条规则,p值<0.05。使用相同的方法,我们还证实了文献中已有的几条致突变性规则的统计学意义,这些规则在很大程度上已得到认可。此外,我们通过预测外部数据集的致突变性扩展了我们方法的应用。