Meyer David E, Cashman Sarah, Gaglione Anthony
Center for Environmental Solutions and Emergency Response, U.S. Environmental Protection Agency, Cincinnati, Ohio.
Eastern Research Group, Inc., Lexington, Massachusetts.
J Ind Ecol. 2021 Feb 1;25(1):20-35. doi: 10.1111/jiec.13044.
This study proposes methods to improve data mining workflows for modeling chemical manufacturing life cycle inventory. Secondary data sources can provide valuable information about environmental releases during chemical manufacturing. However, the often facility-level nature of the data challenges their utility for modeling specific processes and can impact the quality of the resulting inventory. First, a thorough data source analysis is performed to establish data quality scoring and create filtering rules to resolve data selection issues when source and species overlaps arise. A method is then introduced to develop context-based filter rules that leverage process metadata within data sources to improve how facility air releases are attributed to specific processes and increase the technological correlation and completeness of the inventory. Finally, a sanitization method is demonstrated to improve data quality by minimizing the exclusion of confidential business information (CBI). The viability of the methods is explored using case studies of cumene and sodium hydroxide production in the United States. The attribution of air releases using process context enables more sophisticated filtering to remove unnecessary flows from the inventory. The ability to sanitize and incorporate CBI is promising because it increases the sample size, and therefore representativeness, when constructing geographically averaged inventories. Future work will focus on expanding the application of context-based data filtering to other types and sources of environmental data.
本研究提出了改进数据挖掘工作流程的方法,用于对化工制造生命周期清单进行建模。二手数据源可以提供有关化工制造过程中环境排放的宝贵信息。然而,这些数据通常具有设施层面的性质,这对其用于特定过程建模的效用提出了挑战,并可能影响最终清单的质量。首先,进行全面的数据源分析,以建立数据质量评分并创建过滤规则,以解决当数据源和物种出现重叠时的数据选择问题。然后引入一种方法来制定基于上下文的过滤规则,该规则利用数据源中的过程元数据来改进设施空气排放归因于特定过程的方式,并提高清单的技术相关性和完整性。最后,展示了一种净化方法,通过尽量减少对机密商业信息(CBI)的排除来提高数据质量。使用美国异丙苯和氢氧化钠生产的案例研究来探索这些方法的可行性。利用过程上下文对空气排放进行归因,可以实现更复杂的过滤,以从清单中去除不必要的流量。净化和纳入CBI的能力很有前景,因为在构建地理平均清单时,它增加了样本量,从而提高了代表性。未来的工作将集中于将基于上下文的数据过滤应用扩展到其他类型和来源的环境数据。