Laboratory of Environmental Chemistry and Toxicology, IRCCS - Istituto di Ricerche Farmacologiche Mario Negri, Via La Masa 19, 20156 Milano, Italy; Dipartimento di Farmacia - Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via E. Orabona 4, 70125 Bari, Italy.
Laboratory of Environmental Chemistry and Toxicology, IRCCS - Istituto di Ricerche Farmacologiche Mario Negri, Via La Masa 19, 20156 Milano, Italy.
Environ Res. 2015 Feb;137:398-409. doi: 10.1016/j.envres.2014.12.019. Epub 2015 Jan 21.
The bioconcentration factor (BCF) is an important bioaccumulation hazard assessment metric in many regulatory contexts. Its assessment is required by the REACH regulation (Registration, Evaluation, Authorization and Restriction of Chemicals) and by CLP (Classification, Labeling and Packaging). We challenged nine well-known and widely used BCF QSAR models against 851 compounds stored in an ad-hoc created database. The goodness of the regression analysis was assessed by considering the determination coefficient (R(2)) and the Root Mean Square Error (RMSE); Cooper's statistics and Matthew's Correlation Coefficient (MCC) were calculated for all the thresholds relevant for regulatory purposes (i.e. 100L/kg for Chemical Safety Assessment; 500L/kg for Classification and Labeling; 2000 and 5000L/kg for Persistent, Bioaccumulative and Toxic (PBT) and very Persistent, very Bioaccumulative (vPvB) assessment) to assess the classification, with particular attention to the models' ability to control the occurrence of false negatives. As a first step, statistical analysis was performed for the predictions of the entire dataset; R(2)>0.70 was obtained using CORAL, T.E.S.T. and EPISuite Arnot-Gobas models. As classifiers, ACD and logP-based equations were the best in terms of sensitivity, ranging from 0.75 to 0.94. External compound predictions were carried out for the models that had their own training sets. CORAL model returned the best performance (R(2)ext=0.59), followed by the EPISuite Meylan model (R(2)ext=0.58). The latter gave also the highest sensitivity on external compounds with values from 0.55 to 0.85, depending on the thresholds. Statistics were also compiled for compounds falling into the models Applicability Domain (AD), giving better performances. In this respect, VEGA CAESAR was the best model in terms of regression (R(2)=0.94) and classification (average sensitivity>0.80). This model also showed the best regression (R(2)=0.85) and sensitivity (average>0.70) for new compounds in the AD but not present in the training set. However, no single optimal model exists and, thus, it would be wise a case-by-case assessment. Yet, integrating the wealth of information from multiple models remains the winner approach.
生物浓缩因子 (BCF) 是许多监管环境中生物蓄积危害评估的重要指标。REACH 法规(化学品注册、评估、授权和限制)和 CLP(分类、标签和包装)都要求进行 BCF 的评估。我们使用 9 种著名且广泛使用的 BCF QSAR 模型对存储在专门创建的数据库中的 851 种化合物进行了评估。通过考虑确定系数 (R²) 和均方根误差 (RMSE),评估了回归分析的良好程度;为所有与监管目的相关的阈值(即 100L/kg 用于化学安全评估;500L/kg 用于分类和标签;2000L/kg 和 5000L/kg 用于持久性、生物累积性和毒性 (PBT) 和非常持久性、非常生物累积性 (vPvB) 评估)计算了库珀统计数据和马修相关系数 (MCC),以评估分类,特别注意模型控制假阴性发生的能力。作为第一步,对整个数据集的预测进行了统计分析;使用 CORAL、T.E.S.T. 和 EPISuite Arnot-Gobas 模型,获得了 R²>0.70。在分类方面,ACD 和基于 logP 的方程是最好的,敏感性范围从 0.75 到 0.94。对于有自己训练集的模型,进行了外部化合物预测。CORAL 模型的性能最好(R²ext=0.59),其次是 EPISuite Meylan 模型(R²ext=0.58)。后者在外部化合物上的敏感性也最高,值在 0.55 到 0.85 之间,具体取决于阈值。对于属于模型适用性域 (AD) 的化合物,还编制了统计数据,表现出更好的性能。在这方面,VEGA CAESAR 是回归(R²=0.94)和分类(平均敏感性>0.80)方面最好的模型。该模型还在 AD 中但不在训练集中的新化合物的回归(R²=0.85)和敏感性(平均>0.70)方面表现最佳。然而,不存在单个最佳模型,因此需要逐个案例进行评估。然而,整合来自多个模型的丰富信息仍然是赢家的方法。