Command Palette
Search for a command to run...
Flood Forecasting Performance Comparable to the U.S. National Weather Service; the knowledge-guided Machine Learning Model FHNN Improves Forecasting Accuracy by Combining real-time Observation data.

Floods are among the most common and widespread natural disasters globally, posing a long-term threat to socio-economic development and public safety. With climate change leading to increased frequency of extreme rainfall events, flood risk is showing a significant upward trend in many regions. Accurate and timely flood forecasts not only provide crucial information for disaster prevention and mitigation but also offer critical decision support for water resource allocation, urban management, and agricultural production.
For a long time, flood forecasting has mainly relied on physical process models (PBM).Based on the hydrological cycle theory, runoff changes are predicted by simulating processes such as precipitation, evaporation, changes in soil moisture content, groundwater recharge, and river confluence.For example, the Sacramento Soil Moisture Accounting Model (SacSMA), widely used by the U.S. National Weather Service, is a typical watershed hydrological model. Physical models have a clear scientific basis and play a crucial role in hydrological research and operational forecasting. However,These types of models typically require complex parameter calibration, and their simulation capabilities are often limited for hydrological processes with strong nonlinear characteristics.
In recent years, AI technology has developed rapidly in the field of hydrology, especially the increasingly widespread application of deep learning models in runoff prediction. Time series neural networks such as Long Short-Term Memory (LSTM) networks can learn complex rainfall-runoff relationships from large amounts of historical data, and have shown superior predictive capabilities compared to traditional models in many studies.
However, purely data-driven models also face new challenges. On the one hand, these models often lack physical interpretability and struggle to reflect real hydrological processes; on the other hand, their generalization ability remains uncertain in extreme climate events or unobserved watersheds. Therefore, a new research approach has gradually emerged in the hydrological community.This involves integrating domain knowledge into machine learning models to build intelligent models that are both highly predictive and conform to physical laws.This direction is known as "Knowledge-Guided Machine Learning (KGML)".
Against this backdrop, a research team from the University of Minnesota Twin Cities has developed a new knowledge-guided machine learning model.The algorithmic structure of this model is directly inspired by hydrological science and is called Factorized Hierarchical Neural Network (FHNN).Research shows that, on a timescale of 2–7 days after forecast release, the model performs comparably to or even better than the National Weather Service’s flood forecasts, and outperforms mainstream machine learning methods that do not incorporate physical science knowledge into their structure.
The relevant research findings, titled "Knowledge-Guided Machine Learning for Operational Flood Forecasting," have been published in Water Resources Research.
Research highlights:
* The proposed method integrates observational information through an inverse model to construct a hierarchical multi-scale watershed state representation.
* 12–18 hours after forecast generation, the FHNN model generally outperformed expert human forecasters using physical mechanism models.
* The proposed method outperforms a state-of-the-art alternative model (autoregressive LSTM), particularly in arid watersheds, regions that are often difficult to predict using other methods.

Paper address:
https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2024WR039064
Follow our official WeChat account and reply "FHNN" in the background to get the full PDF.
Datasets: Balancing benchmark datasets and business datasets
To validate the model's predictive power, the researchers used two types of datasets:
Large-sample CAMELS-US benchmark dataset
During the model training and basic evaluation phases, the well-known CAMELS-US dataset was used. CAMELS (Catchment Attributes and Meteorology for Large-Sample Studies) is one of the most influential datasets in hydrological research in recent years, and its core feature is that it contains a large amount of long-term hydrological and meteorological observation data from watersheds. The CAMELS-US dataset covers hundreds of watersheds across the continental United States, including diurnal data on precipitation, temperature, evapotranspiration, and river runoff, and also provides rich watershed attribute information.For example, topography, climate type, soil conditions, and vegetation cover. This information provides an important foundation for studying hydrological processes under different environmental conditions.
In this study,The researchers selected 531 watersheds as experimental subjects.The data is divided into training, validation, and testing phases according to time sequence:
* 1985–1993 as a training period
* 1993–1995 as the verification period
* 1995–2005 as a testing period
Business flood forecast data
In addition to the standard dataset, the study further introduced real operational flood forecast data to test the model's performance in actual forecasting environments.The study selected several river basins under the jurisdiction of the National Weather Service North-Middle River Forecasting Center (NCRFC) as case studies.These watersheds are located in the Midwestern United States and have typical continental climate characteristics, experiencing both floods caused by heavy rainfall and snowmelt floods, making them highly representative. The relevant hydrological data primarily comes from river flow observations by the U.S. Geological Survey (USGS), while meteorological data such as precipitation and temperature are from the National Weather Service's forecast database.
It is worth noting that the U.S. National Weather Service's flood forecasting system employs an "involved forecasting" model. In this model, a physical model first generates initial forecast results, which are then adjusted by experienced hydrographers based on real-time observations and their expertise, ultimately forming the official forecast. This method can significantly improve forecast accuracy in many cases. Therefore, in this study,Comparing human expert forecasts with automated machine learning models is significant because it directly reflects the application potential of AI models in real-world business environments.
Model Framework: Knowledge-Guided Architecture FHNN
FHNN is a knowledge-guided architecture designed to model complex, hierarchical system dynamics processes across multiple time scales.
This hierarchical interaction structure is crucial for watershed hydrological modeling. For example, a rainstorm event can cause rapid changes in near-surface soil moisture storage, which is then utilized by plants through evapotranspiration, a process that can vary on hourly, daily, and seasonal scales. Simultaneously, this rainfall can replenish groundwater storage through deeper soil layers over time, where changes are typically more gradual. Furthermore, near-surface soil moisture storage also influences how much rainfall or snowmelt is converted into flood runoff.
The FHNN method aims to capture these multi-scale and hierarchical processes that are ubiquitous in hydrology and runoff generation.Its overall architecture is shown in the following diagram:

In the FHNN architecture, knowledge is introduced in two ways:
Method 1: Using an encoder-decoder architecture
This approach explicitly models the forward and reverse processes using a state encoder (inverse model) and a response decoder (forward model).The encoder part is considered an "inverse model," whose main function is to infer the current internal state of the watershed using historical meteorological and runoff data.For example, by analyzing past changes in precipitation, temperature, and runoff, the model can estimate key variables such as current soil moisture content and groundwater reserves. Although these variables are difficult to observe directly in reality, they can be effectively estimated using machine learning methods. After obtaining the watershed status, the model enters the decoding phase.
The decoder is considered a "forward model" whose task is to predict future runoff changes based on known watershed conditions and future weather forecasts.
The FHNN model is trained end-to-end to minimize the difference between the predicted and actual response data. Furthermore, the architecture updates the encoder state in real time whenever a runoff observation (response) is obtained, enabling dynamic data integration.
Method 2: Introducing knowledge into the FHNN architecture through hierarchical factorization design
In this design,The encoder of FHNN is constructed to capture multi-scale processes and their interactions.The hierarchical state encoder uses multiple bidirectional LSTMs to take historical runoff observations and meteorological data as input and generate embeddings for different time resolutions/scales (e.g., slow, medium, and fast).
These embeddings provide compressed representations of information contained in historical driving data, system responses, and their multi-scale interactions (seasonal, sub-seasonal, and daily/sub-daily scales). These embeddings, as compressed representations of potential system states (e.g., soil moisture, spatial connectivity, snow stock), are concatenated to initialize the hidden and unit states of the decoder. Subsequently,The decoder takes future weather drivers as input to generate runoff predictions. The encoder and decoder are jointly trained using a single objective function.The objective function minimizes the root mean square error (RMSE) between predicted and observed runoff within the target prediction time window.
Bidirectional LSTM reads sequences from both directions simultaneously, enabling the encoder to utilize all available relationships in the observation data to gain a more comprehensive understanding of the state within the watershed.This approach also has intuitive implications in hydrology. For example, researchers can obtain soil moisture information by observing rainfall and its delayed runoff response; similarly, they can infer soil moisture conditions by first observing the runoff response and then analyzing the rainfall input that produced the event. The bidirectional LSTM encoder enables the model to analyze historical data from these two "perspectives" and obtain the final "best estimate" used to initialize the decoder's hidden state and cell states.
FHNN outperforms expert human forecasters who use physical mechanism models overall.
Researchers demonstrated the predictive power of FHNN in hydrological forecasting through multiple experiments. The first set of experiments compared FHNN with LSTM-AR, a leading deep learning method with the same input variables and data integration capabilities, on a large-sample CAMELS dataset. The second set of experiments focused on the performance of FHNN in operational forecasting environments, evaluating the performance of official NWS forecasting sites in the Midwestern United States.
Comparison with LSTM model
On the CAMELS-US dataset, FHNN was compared with the traditional autoregressive LSTM model (LSTM-AR).FHNN outperforms LSTM-AR in both the 7-day prediction period and overall prediction.Even when both models were trained only on 1-day forecasts, FHNN still showed better performance. The overall performance is shown in the table below:

By analyzing the relationship between the performance differences and characteristics of each watershed, a diagram was created.Researchers also found that FHNN outperformed LSTM-AR in watersheds with low precipitation, low runoff coefficients, and high drought levels.As shown below:

Relationship between precipitation, runoff coefficient and drought index
No obvious trends were observed in the baseflow index, potential evapotranspiration (PET), and watershed slope. This result indicates that...In arid watersheds and watersheds with a low ratio of total runoff to total precipitation, FHNN shows the greatest performance advantage over LSTM-AR.
Researchers also compared FHNN with LSTM-AR in the KALI4 watershed of the NWS, and further compared it with the predictive capabilities of NWS expert human forecasters, as shown in the figure below:

The results show that on the first day after the forecast is issued, the predictive ability of NWS expert forecasters using the SacSMA model is higher than that of FHNN and LSTM-AR; however, within the same time period,FHNN still outperforms LSTM-AR and demonstrates better data integration capabilities in flood events.Within a forecast lead time of 2–4 days and longer, FHNN exhibits the highest relative predictive power compared to NWS forecasts and LSTM-AR.
Comparison with business forecasts
The study also analyzed 46 real flood events, and the results showed that:FHNN outperformed the official forecast in the 65% event.As shown in the table below:

In terms of forecast lead time: for water level forecasts (i.e., forecasts actually issued by NWS).FHNN began to outperform NWS expert forecasters 12 hours (2 time steps) after forecast release;For traffic forecasting, FHNN outperforms NWS expert forecasters after 18 hours (3 time steps). Between day 2 and day 3–4 (depending on the evaluation metrics), FHNN's predictive ability is significantly higher than that of human forecasters. After day 4, there is no longer a significant difference in predictive ability between FHNN and human forecasters.
Flood peak prediction capability
A key performance indicator is the error in predicting the stream stage crest.This refers to the highest peak value reached in the runoff hydrograph during a particular rainfall or snowmelt event.Therefore, the researchers evaluated the performance of the FHNN and NWS human forecasters in flood peak prediction (both using uncertain future precipitation forecasts). They also compared the performance of the FHNN and human forecasters with the underlying SacSMA model used by the forecasters. The results showed that...FHNN significantly outperforms physical models without human correction in flood peak prediction, but still falls slightly short of expert forecasts.
Human forecasters outperformed FHNNs in flood peak estimation across almost all lead times (except for lead times of approximately 60 hours or more), as shown in the figure below:

However, even with incompletely known future weather conditions, FHNN's estimation of flood peaks still outperforms the SacSMA model, which is driven solely by observed precipitation but without forecaster intervention.
Between 48 and 18 hours before the flood peak,FHNN achieves a similar rate of forecast improvement as human forecasters through data integration.During this period, the forecast was updated every 6 hours, and the flood peak prediction error (RMSE) decreased by about 0.2 feet; however, human forecasters maintained their predictive advantage in all forecasts within 2.5 days before the flood peak; 12–18 hours (2–3 time steps) before the flood peak, the flood peak prediction RMSE of FHNN basically stopped decreasing and even increased slightly.
This indicates that FHNN is less sensitive to system changes than human forecasters when approaching flood peaks. This result is consistent with comparisons of overall forecasting capabilities, where NWS has higher predictive power in the first 12–18 hours after forecast issuance. The insufficient response of FHNN near flood peaks may be related to problems with extreme value prediction.For any LSTM model, predicting the highest flood peak is often difficult because there are relatively few extreme flood events in the training data.
Progress in the application of artificial intelligence in hydrological research
In recent years, artificial intelligence technology has been profoundly changing the technological approach to hydrological research and operational forecasting. From early methods based on statistical regression to today's data-driven models represented by deep learning, hydrological forecasting is gradually moving towards a more intelligent and automated stage of development.
At the application level, temporal deep learning models, represented by Long Short-Term Memory (LSTM) networks, have become one of the mainstream tools for hydrological forecasting. Numerous studies have shown that...These types of models generally outperform traditional physical models in multi-basin runoff simulation, especially in areas with abundant data, where their predictive capabilities are even more outstanding.
In recent years, the Transformer architecture has been gradually introduced into the field of hydrology. Its advantages in long-sequence modeling have provided new possibilities for capturing long-term hydrological memory. At the same time, academia and engineering have gradually recognized that relying solely on data-driven models has certain limitations. For example, a lack of physical constraints may lead to results that do not conform to hydrological laws in extreme cases, and the model's interpretability is also weak. Therefore,"Physical information-driven" or "knowledge-guided" machine learning methods have become a new research hotspot.
In terms of recent research advancements, multi-source data fusion is becoming an important direction for improving the capabilities of hydrological models. Combining remote sensing data (such as satellite precipitation, soil moisture, and snow water equivalent) with ground observation data enables models to obtain more comprehensive watershed information. Simultaneously, graph neural networks (GNNs) are also beginning to be used for modeling spatial relationships between watersheds, contributing to improved flood prediction capabilities at the regional scale.
Recently, Google Research open-sourced the Groundsource flood dataset, which extracts validated ground information from unstructured data, enabling the mapping of historical disaster footprints with unprecedented accuracy.Researchers automated the processing of over 5 million news reports from more than 150 countries, ultimately compiling over 2.6 million records of historical flood events, providing an unprecedented scale and coverage of data for global flood research.
at present,The "Groundsource Global Flood Events Dataset" is now available on the HyperAI website (hyper.ai) in the datasets section and can be used online.
https://go.hyper.ai/KO3dB
Earlier, Grey Nearing and his team from Google Research developed a machine learning-based river forecast model that can reliably predict floods up to 5 days in advance. When predicting floods that occur once every 5 years, the model outperforms or is equivalent to the current method for predicting floods that occur once every 1 year. The system can cover more than 80 countries.
Paper Title: Global prediction of extreme floods in ungauged watersheds
Paper address:https://www.nature.com/articles/s41586-024-07145-1
From a business application perspective, artificial intelligence will not completely replace traditional hydrological forecasters, but is more likely to play a role through "human-machine collaboration." AI models can provide fast and stable prediction results, while experts can make corrections and judgments on key scenarios based on experience. This collaborative model can not only improve forecasting efficiency but also help enhance the reliability of the system in extreme events. With the continuous expansion of data scale and the continuous improvement of algorithm capabilities, future flood forecasting systems will be more intelligent, efficient, and adaptable, providing stronger technical support for disaster prevention and mitigation and water resource management.
References:
1.https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2024WR039064
2.https://phys.org/news/2026-03-ai-higher-accuracy-current-methods.html
3.https://mp.weixin.qq.com/s/ZWU-v_4k7FIm0MoDh6Rxuw
4.https://www.nature.com/articles/s41586-024-07145-1








