Abrehdari Seyed Hossein
University of Tehran, Tehran, Iran.
Institute of Geophysics and Engineering Seismology, National Academy of Sciences, Armenia.
MethodsX. 2025 Jun 12;15:103428. doi: 10.1016/j.mex.2025.103428. eCollection 2025 Dec.
This study investigates the development and implementation of a seismic database utilizing process mining techniques. This data format is generated and stored in seismic centers, such as the U.S. Geological Survey (USGS). The study explored the various stages involved in the preparation, delivery, and processing of a database containing almost 900 earthquake waveform records (considered big data) by utilizing process mining techniques. The data were gathered from a region spanning 388,111.5 km², located between 44°-51°E and 38°-42.5°N, over the period from 1999 to 2018, and sourced from the USGS. The findings of this study indicate that the use of process mining methodologies decreases the time needed for database creation, including request, collection, preparation, and delivery, from 25 days with manual processing to approximately 8 days. In parallel, custom-built software scripts (computer codes) were deployed as unmanned tools to streamline the time-consuming phases of database creation. The idea presented in this study can help optimize the time for creating, storing, and delivering the database in seismological centers or other data centers, especially in an era where efficient management of large scientific datasets is increasingly vital. In total, process mining techniques were employed to analyze the workflow involved in creating a large database, including the steps of data request, preparation, and delivery.