Document Type : Original Article
Authors
Department of Reclamation of Arid and Mountainous regions Engineering, Faculty of Natural Resources, University of Tehran, Karaj, Iran
10.22034/iwm.2026.2077954.1254
Abstract
Extended Abstract
Introduction: In recent decades, the increased intensity and frequency of flood events due to climate change, urbanization, and land degradation have become a major challenge in water resource management. The occurrence of flash floods in mountainous areas of the country, including the Taleghan watershed, results in consequences such as soil erosion, damage to infrastructure and threats to water resources. Therefore, accurate prediction of flood discharge and hydrological behavior of watersheds is essential for timely decision-making in early warning systems and risk management. The present study aimed to evaluate and compare the performance of three models, XBeach, DBN, and LightGBM, in predicting the maximum monthly flood discharge in the Taleghan watershed.
Materials and methods: The data used included time series of monthly discharge from hydrometric stations in the region over several consecutive years. To evaluate the effect of temporal memory, four different seasonal combinations were designed, including lags of one to four seasons (each season consisting of three months) in the discharge modeling. After preprocessing, standardization and separation into training and test sets, the data were analyzed within the framework of the three models using the R programming software. The physical model XBeach was analyzed as a process-based baseline, while the DBN model was implemented using the structure of dynamic Bayesian networks and probabilistic hidden layers to represent temporal dependencies of the data. The LightGBM model was designed as a gradient boosting tree, with optimization of learning parameters and non-surface growth of trees. The performance of the models was evaluated using the statistical indices NSE, RMSE, MAE and the correlation coefficient (R).
Results and Discussion: The results showed that the LightGBM model had a significantly better performance than the other two models in all stations and in all temporal combinations. With NSE values ranging from 0.908 to 0.931 and correlation coefficient between 0.896 and 0.918, this model showed the highest degree of agreement with the observed data. Also, the RMSE error values for the LightGBM model at the studied stations ranged from 0.079 to 0.131, indicating the high accuracy of the model in predicting the maximum monthly discharge. The DBN model provided good performance with NSE values between 0.864 and 0.896 and correlation coefficient 0.816 to 0.832, while the XBeach numerical model with NSE values of 0.834 to 0.862 and correlation of 0.807 to 0.823 had lower accuracy than the two data-driven models. Despite the relative superiority of the DBN model over XBeach, the performance difference between the two models was not statistically significant in most temporal combinations. Given the complexity of the network structure and the higher computational cost of DBN, the use of the XBeach model is suggested as a more practical option in the second rank. In addition, the ability of the LightGBM model to reproduce seasonal fluctuations and strong flow trends was more significant than other models. The DBN model also performed better compared to XBeach, but the statistical difference resulting from the evaluation criteria between the two models was not significant in most combinations. Considering the complexity of the network structure, high computation time, and cost of adjusting DBN parameters, the use of the XBeach numerical model is recommended as a more economically and practically efficient option. In contrast, the ability of LightGBM to combine nonlinear features of input data and learn complex patterns made this model the most accurate and stable option for flow prediction.
Conclusion: Correlation analyses between observational and predicted data showed that the points resulting from the LightGBM model had the highest density along the one-to-one correlation line and showed the least dispersion, while the XBeach and DBN models had more deviation from the correlation line. These findings are consistent with recent studies that have reported the effectiveness of gradient boosting algorithms in flow and flood prediction. Overall, the findings of this study suggest that the use of lightweight and fast data-driven models such as LightGBM can play an important role in the development of flood forecasting systems in mountainous basins such as Taleghan. High accuracy, fast update capability, low computational requirements, and the possibility of integrating with remote sensing data make this model a suitable option for use in early warning systems and intelligent water resource management. The main innovation of this research is to provide a quantitative and systematic comparison between lightweight data-driven models, deep learning, and physical process models for predicting the maximum monthly discharge, as well as analyzing the sensitivity of their performance to the input temporal memory length, which leads to the presentation of an operational decision-making framework based on accuracy and computational cost.
Keywords
Subjects