News

Introducing GEFSv13 Replay Dataset Designed for Training of AI Models for Coupled Prediction

January 24, 2025

Authors: Sergey Frolov

The NOAA Unified Forecast System (UFS) / Global Ensemble Forecast System version 13 (GEFSv13) replay data set was developed to provide initial conditions for the retrospective forecast archive in support of the next implementation of the NOAA medium range forecast system (GEFSv13 / GFSv17). Increasingly, this dataset is being used for training of the coupled machine learning models.

Several aspects of the UFS-Replay dataset make it a unique addition to the corpus of training data already available on-line. In addition to the native output of the model in netcdf format available at AWS, we converted the output to the cloud native zarr format stored on the GCS cloud (see example of access using this jupyter notebook). We provide both the native resolution output at ¼ degree for ocean, ice, land and atmosphere, as well as the subsampled 1-degree output. To our knowledge, the UFS-Replay dataset is the only ocean analysis dataset that provides sub-daily ocean and ice variability in the easy-to-access zarr format.

In addition to the gridded replay output, we provide evaluations in observational space through the Replay Observer Diagnostic (ROD). This dataset includes observation equivalents from the NOAA NASA Joint Archive (NNJA) and the replay dataset, using Gridpoint Statistical Interpolation (GSI) data assimilation software. The output includes open-access observations, replay observation equivalents, original observation values and metadata, bias-corrected observations, and bias coefficients estimated by GSI. The ROD output for each six-hour cycle is located within the GSI subdirectory and is provided in the netcdf-based GSI ncdiag file format.

Replay methodology: This dataset was produced by replaying [Orbe et.al 2017] the coupled version of the new UFS model to external reanalysis: ERA5 for the atmosphere and ORAS5 for the ocean and ice. For the replay experiment, we used the HR1 tag of the NOAA UFS coupled model that included atmospheric, oceanic, ice, land, and wave model components (see Table 1 for component description).

Table 1: Summary of the UFS and replay components in the GEFSv13 replay dataset

Earth system component	Model version	Resolution	Replay constraint
Ocean	MOM6	72 hybrid levels, nominal ¼ degree tri-polar grid	Temperature, salinity, currents (horizontal velocity components) replayed to ORAS5
Atmosphere	UFS/FV3.HR1 physics	C386 cubed sphere (¼ degree), 127 vertical levels	Temperature, specific humidity, winds (horizontal velocity components), ozone, surface pressure replayed to ERA5
Land	Noah-MP LSM	Four vertical levels	Snow depth assimilation from NCEI Global Historical Climatology Network (GHCN) and snow cover from U.S. National Ice Center Interactive Multisensor Snow and Ice Mapping System (IMS)
Wave	WaveWatch III	¼ degree	None
Ice	CICE6	¼ degree tri-polar grid, 5 ice categories, 7 model levels	Ice concentration inserted from ORAS5 using Sea-ice, Ocean, and Coupled Assimilation project (SOCA)

The replay methodology enables the dataset to track the dynamics of specific variables from an external reference dataset, while still allowing the coupled model to generate unconstrained model variables that align with the dynamics of the UFS. For example, the replay process constrains the atmospheric dynamics, such as 3D temperature, humidity, and wind, along with oceanic parameters like temperature, salinity, and currents. Meanwhile, the UFS model remains free to calculate its own surface fluxes between coupled components and determine land and wave model states that are not directly influenced by the external reanalysis.

In simpler terms, “replay” is a process where the model mimics the real-world conditions provided by the external reference data but is not fully restricted by it. This allows the model to simulate realistic conditions while maintaining the flexibility to evolve its own variables based on its internal dynamics. This approach helps to generate high-quality datasets for model validation, training, and development.

In addition to applying the external reanalysis to atmospheric and oceanic states, we incorporated a future version of NOAA’s JEDI-based land data assimilation system (Gichamo and Draper, 2022) to constrain snow depth. This was achieved by assimilating snow depth observations from the NCEI Global Historical Climatology Network (GHCN) daily station data and satellite-derived snow cover from the U.S. National Ice Center’s Interactive Multisensor Snow and Ice Mapping System (IMS). The JEDI Sea-ice Ocean and Coupled Analysis (SOCA) system was also employed to adjust sea-ice thickness, concentration, and snow depth over ice, ensuring consistency with the ORAS5 sea-ice analysis.

The original replay dataset spans from January 1994 to October 2023 at a nominal ¼-degree resolution. Since its initial production, the dataset has been extended multiple times toward the present day, and we plan to continue these updates until GEFSv13/GFSv17 transitions to operational use. In addition to the ¼-degree version, we are generating a 1-degree native version of the replay dataset by replaying the 1-degree UFS model to ERA5 and ORAS5 reanalysis data from 1958 to 2023. A sample of this 1-degree native replay data is available and will be extended as new model output is generated.

The replay dataset is hosted in an egress-free environment on cloud platforms such as Amazon Web Services (AWS) and Google Cloud Platform (GCP), supported by the NOAA Open Data Dissemination program. The original model output is stored in AWS in its native format. To facilitate analysis and support efficient AI training pipelines, we also provide a curated version of the replay dataset in the Zarr format, available on the GCP cloud service. (See the Data Access tab for more details.)

For additional information visit UFS Replay: NOAA Physical Sciences Laboratory.

References

Gichamo, T. Z., and C. S. Draper, 2022: An Optimal Interpolation–Based Snow Data Assimilation for NOAA’s Unified Forecast System (UFS). Wea. Forecasting, 37, 2209–2221, https://doi.org/10.1175/WAF-D-22-0061.1

Orbe, C., Oman, L. D., Strahan, S. E., Waugh, D. W., Pawson, S., Takacs, L. L., and Molod, A. M., 2017: Large-scale atmospheric transport in GEOS replay simulations. Journal of Advances in Modeling Earth Systems, 9, 2545–2560, https://doi.org/10.1002/2017MS001053

Earth Prediction Innovation Center

Unified Forecast System

Earth Prediction Innovation Center

Introducing GEFSv13 Replay Dataset Designed for Training of AI Models for Coupled Prediction

Table 1: Summary of the UFS and replay components in the GEFSv13 replay dataset

References

Follow Us

Recent News

An early look at NOAA’s Project EAGLE to accelerate AI weather prediction advances for the United States

New Idealized, Regional Tropical Cyclone Test Case Added to UFS Weather Model

NWS launches Warn-on-Forecast System Demonstration Project

Community Modeling on Community Platforms – One member’s perspective on the Unified Forecast System

Join Our Community

Students

Academia

Industry

Government

Stay Connected

EPIC Program Office

UFS Community

Join Our Community

Students

Academia

Industry

Government

Stay Connected

EPIC Program Office

UFS Community

Join Our Community

Stay Connected

EPIC Program Office

UFS Community

EPIC

Contact Us

Stay Connected