July 17, 2025
Research Highlight

Predicting River Sediment Respiration Rates Using Machine Learning

Data-driven models enhance understanding of biogeochemical processes in river environments

a researcher holds a vial of river sediment and water taken to sample respiration

Researchers analyze sediment samples collected by the Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems consortium. They measure changes in dissolved oxygen concentration over time from which they calculate respiration rates. The data are used to train machine learning models to predict respiration at unsampled locations. 

(Photo by Andrea Starr | Pacific Northwest National Laboratory)

The Science

Researchers from Pacific Northwest National Laboratory (PNNL) and Parallel Works, Inc., applied machine learning (ML) methods to predict how much oxygen and nutrients are used by microorganisms in river sediments. Their ML approaches considered over 100 environmental factors, like stream order and soil characteristics, which vary across the contiguous United States. Most of the data for the ML methods were obtained from public databases, but direct measurements of oxygen use were generated from samples crowdsourced through the Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems (WHONDRS) consortium. The team found that the chemistry of organic matter in the sediment is crucial for predicting oxygen use by microorganisms. The team created maps to show how these predictions vary across the Columbia River Basin in the Pacific Northwest.

The Impact

The research helps explain which environmental factors affect oxygen and nutrient use by microorganisms in river sediments, which in turn control fluxes of carbon dioxide. Using machine learning, researchers can predict these factors over large areas, which could support regional and global models of the Earth system. This information can also be vital for managing river health and ecosystem services. For example, it can help improve water quality by informing policies and practices for river management.

Summary

This study used community-generated data and public river databases to compile a large dataset with 367 samples and 133 features, although not all samples had all features. A multi-institutional team of researchers developed a two-tiered ML approach which used a stacked ensemble of models that optimizes hyperparameters and a feature permutation importance analysis used to detect significant features. The team discovered that sediment organic matter chemistry, especially as determined by high-resolution mass spectrometry, is a critical factor in predicting sediment respiration rates. Larger-scale variables, like stream order, geography, and population, also play significant roles. Combining these data, researchers generated spatial maps predicting oxygen consumption across the Columbia River Basin. This approach not only helps in parameterizing watershed models but also offers a template for other applications by providing modular, portable, open, and cloud-based implementation.

Research Contact

Timothy Scheibe, Tim.Scheibe@pnnl.gov

Funding

This research was supported by the Small Business Innovation Research program of the Department of Energy Office of Science and by the DOE Biological and Environmental Research program through the River Corridor Science Focus Area project at PNNL. WHONDRS data were generated in part using capabilities available at the Environmental Molecular Sciences Laboratory, a DOE Office of Science user facility at PNNL.

Related Links

Published: July 17, 2025

Gary S. F., T. D. Scheibe, E. Rexer, A. V. Torreira, V. A. Garayburu-Caruso, A. Goldman, and J. C. Stegen. 2024. “Prediction of distributed river sediment respiration rates using community-generated data and machine learning.” Journal of Geophysical Research: Machine Learning and Computation, 1, e2024JH000199, doi: 10.1029/2024JH000199.