Cheng Fei, Li Huizhen, Brooks Bryan W, You Jing
Guangdong Key Laboratory of Environmental Pollution and Health, School of Environment, Jinan University, Guangzhou 511443, China.
Department of Environmental Science, Institute of Biomedical Studies, Center for Reservoir and Aquatic Systems Research, Baylor University, Waco, Texas 76798, United States.
Environ Sci Technol. 2021 Jul 6;55(13):8977-8986. doi: 10.1021/acs.est.1c00152. Epub 2021 Jun 18.
Selection of toxicity endpoints affects outcomes of risk assessment. Scientific decisions based on more holistic evidence is preferable for designing bioassay batteries rather than subjective selections, particularly when systems are poorly understood. Here, we propose a novel event-driven taxonomy (EDT)-based text mining tool to prioritize stressors likely to elicit water quality deterioration. The tool integrated automated literature collection, natural language processing using adverse outcome pathway-based toxicological terminologies and machine learning to classify event drivers (EDs). From aquatic toxicity assessments within China over the past decade, we gathered over 14 000 sources of information. With a dictionary that included 1039 toxicological terms, 15 bioassay-related modes of actions were mapped, yet less than half of the bioassays could be elucidated by available adverse outcome pathways. To fill these mechanistic knowledge gaps, we developed a Naïve Bayesian ED-classifier to annotate apical responses. The classifier's 4-fold cross-validation reached 74% accuracy and labeled 85% bioassays as 26 EDs. Narcosis, estrogen receptor-, and aryl hydrogen receptor-mediators were the major EDs in aquatic systems across China, whereas individual regions had distinct ED fingerprints. The EDT-based tool provides a promising diagnostic strategy to inform region-specific bioassay design and selection for water quality assessments in a big data era.
毒性终点的选择会影响风险评估的结果。在设计生物测定组时,基于更全面证据的科学决策优于主观选择,尤其是在对系统了解不足的情况下。在此,我们提出一种基于新型事件驱动分类法(EDT)的文本挖掘工具,以对可能导致水质恶化的压力源进行优先级排序。该工具集成了自动文献收集、使用基于不良结局途径的毒理学术语的自然语言处理以及用于对事件驱动因素(ED)进行分类的机器学习。从过去十年中国的水生毒性评估中,我们收集了超过14000条信息来源。通过一个包含1039个毒理学术语的词典,绘制了15种与生物测定相关的作用模式,但可用的不良结局途径只能阐明不到一半的生物测定。为了填补这些机制知识空白,我们开发了一个朴素贝叶斯ED分类器来注释顶端反应。该分类器的4折交叉验证准确率达到74%,并将85%的生物测定标记为26种ED。麻醉、雌激素受体和芳基氢受体介导剂是中国水生系统中的主要ED,而各个地区有不同的ED特征。基于EDT的工具为大数据时代水质评估中针对特定区域的生物测定设计和选择提供了一种有前景的诊断策略。