National Oceanic and Atmospheric Administration’s Global System Laboratory (NOAA-GSL), in collaboration with the University of Colorado’s Cooperative Institute for Research In Environmental Sciences (CIRES) and NOAA’s Earth Prediction Innovation Center (EPIC), has developed a data pre-processing system to support prototyping future versions of the Rapid Refresh Forecast System (RRFS) aerosols component. Future RRFS versions will utilize the Model for Prediction Across Scales (MPAS) dynamical core, which, among other features, uses an underlying unstructured, hexagonal grid, in contrast to the Finite Volume Cubed-Sphere (FV3) dynamical core implemented in RRFSv1, which uses a regular, structured grid for regional modeling. This fundamental change in grid structure required a new approach to pre-processing, particularly the regridding step in a data ingestion pipeline. Furthermore, atmospheric composition modeling requires numerous data products, which can lead to significant performance degradation if I/O is unoptimized. The result is a significant step toward a unified approach to high-resolution aerosol modeling across North America, a key part of modeling the Earth system, and contributes to NOAA’s ambitious goal of a fully-coupled, convection allowing, 3-km global model using MPAS.
To address this challenge, the team developed a lightweight Python wrapper around the popular esmpy, netcd4-python (parallel), and pydantic libraries. esmpy provided the core regridding algorithms, netcdf4-python provided I/O operations, and pydantic provided runtime configuration and type checking. The wrapper’s goal was to develop a standardized approach to process aerosol products that GSL/CIRES scientists could adapt for new datasets and, perhaps, extend to other domains. While moving quickly was important for the initial development of the cheMPAS-fire pipeline, the stack was built with Message Passing Interface (MPI)-based parallelism at its core, ensuring the system could scale to higher spatial resolutions and larger datasets. The wrapper is part of the cheMPAS-Fire software ecosystem.
The first iteration of the cheMPAS-fire pre-processor eventually ran into scaling issues as the model and data grid resolutions approached 1 km, and the number of processed data products increased. In response, the team revisited the implementation and identified options to address the performance bottlenecks. First, the runtime environment was moved to a chained, machine-native spack-stack environment to address I/O and memory limitations. Second, with the help of UXarray, the target grid format was migrated to UGRID, a standard format for unstructured grids, which scales better with the Earth System Modeling Framework (ESMF). With these two improvements, the team observed a five-fold decrease in processing time and the ability to run the processor across multiple high-performance computing (HPC) compute nodes. Further improvements focused on code management, including end-to-end integration testing, linter support, and a Docker environment for local and Continuous Integration (CI)-based development.
Looking ahead, the team aims to merge these improvements into future operational codebases, further modularize the preprocessor to improve efficiency and configurability, and explore additional optimizations.
Acknowledgements:
Funding was provided by CIRES Task II, NOAA cooperative agreement NA22OAR4320151, for the Cooperative Institute for Earth System Research and Data Science (CIESRDS); Infrastructure Investment and Jobs Act (Public Law 117-58), and NOAA Weather Program Office’s Earth Prediction Innovation Center (EPIC).



