Dozier Jeff, Frew James
Donald Bren School of Environmental Science and Management, University of California, Santa Barbara, CA 93106-5131, USA.
Philos Trans A Math Phys Eng Sci. 2009 Mar 13;367(1890):1021-33. doi: 10.1098/rsta.2008.0187.
Computational provenance--a record of the antecedents and processing history of digital information--is key to properly documenting computer-based scientific research. To support investigations in hydrologic science, we produce the daily fractional snow-covered area from NASA's moderate-resolution imaging spectroradiometer (MODIS). From the MODIS reflectance data in seven wavelengths, we estimate the fraction of each 500 m pixel that snow covers. The daily products have data gaps and errors because of cloud cover and sensor viewing geometry, so we interpolate and smooth to produce our best estimate of the daily snow cover. To manage the data, we have developed the Earth System Science Server (ES3), a software environment for data-intensive Earth science, with unique capabilities for automatically and transparently capturing and managing the provenance of arbitrary computations. Transparent acquisition avoids the scientists having to express their computations in specific languages or schemas in order for provenance to be acquired and maintained. ES3 models provenance as relationships between processes and their input and output files. It is particularly suited to capturing the provenance of an evolving algorithm whose components span multiple languages and execution environments.
计算溯源——数字信息的前身和处理历史记录——是正确记录基于计算机的科学研究的关键。为了支持水文科学研究,我们利用美国国家航空航天局(NASA)的中分辨率成像光谱仪(MODIS)生成每日积雪面积分数。根据七个波长的MODIS反射率数据,我们估算出每个500米像素中被雪覆盖的比例。由于云层覆盖和传感器观测几何形状的原因,每日产品存在数据空白和误差,因此我们进行插值和平滑处理,以得出每日积雪覆盖情况的最佳估计值。为了管理数据,我们开发了地球系统科学服务器(ES3),这是一个用于数据密集型地球科学的数据软件环境,具有自动、透明地捕获和管理任意计算溯源的独特功能。透明获取避免了科学家必须用特定语言或模式来表达他们的计算,以便获取和维护溯源信息。ES3将溯源建模为流程与其输入和输出文件之间的关系。它特别适合捕获一个不断发展的算法的溯源,该算法的组件跨越多种语言和执行环境。