Institute for Resources Environment, and Sustainability, The University of British Columbia, 2202 Main Mall, Vancouver, BC, V6T 1Z4, Canada.
School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, UK.
Regul Toxicol Pharmacol. 2024 Dec;154:105716. doi: 10.1016/j.yrtph.2024.105716. Epub 2024 Oct 10.
Although uncertainties expressed in texts within QSAR studies can guide quantitative uncertainty estimations, they are often overlooked during uncertainty analysis. Using neurotoxicity as an example, this study developed a method to support analysis of implicitly and explicitly expressed uncertainties in QSAR modeling studies. Text content analysis was employed to identify implicit and explicit uncertainty indicators, whereafter uncertainties within the indicator-containing sentences were identified and systematically categorized according to 20 uncertainty sources. Our results show that implicit uncertainty was more frequent within most uncertainty sources (13/20), while explicit uncertainty was more frequent in only three sources, indicating that uncertainty is predominantly expressed implicitly in the field. The most highly cited sources included Mechanistic plausibility, Model relevance and Model performance, suggesting they constitute sources of most concern. The fact that other sources like Data balance were not mentioned, although it is recognized in the broader QSAR literature as an area of concern, demonstrates that the output from the type of analysis conducted here must be interpreted in the context of the broader QSAR literature before conclusions are drawn. Overall, the method established here can be applied in other QSAR modeling contexts and ultimately guide efforts targeted towards addressing the identified uncertainty sources.
尽管 QSAR 研究文本中表达的不确定性可以指导定量不确定性估计,但在不确定性分析中,它们经常被忽视。本研究以神经毒性为例,开发了一种方法来支持对 QSAR 建模研究中隐含和显式表达的不确定性进行分析。采用文本内容分析法识别隐含和显式不确定性指标,然后根据 20 种不确定性来源识别包含这些指标的句子中的不确定性,并对其进行系统分类。研究结果表明,在大多数不确定性来源中,隐含不确定性更为常见(13/20),而在仅有的三个来源中,显式不确定性更为常见,这表明在该领域,不确定性主要以隐含的方式表达。被引用最多的来源包括机制合理性、模型相关性和模型性能,表明它们是最受关注的来源。尽管在更广泛的 QSAR 文献中,数据平衡等其他来源被认为是一个值得关注的领域,但它们并未被提及,这表明在得出结论之前,必须在更广泛的 QSAR 文献背景下解释这里进行的分析类型的输出。总的来说,这里建立的方法可以应用于其他 QSAR 建模情境,并最终指导针对已确定的不确定性来源的努力。