Access Water | Understanding the Potential for Reinforcement Learning-Based WRRF Control...
lastID = -10116265
Skip to main content Skip to top navigation Skip to site search
Top of page
  • My citations options
    Web Back (from Web)
    Chicago Back (from Chicago)
    MLA Back (from MLA)
Close action menu

You need to login to use this feature.

Please wait a moment…
Please wait while we update your results...
Please wait a moment...
Description: Access Water
Context Menu
Description: WEFTEC 2024 PROCEEDINGS
Understanding the Potential for Reinforcement Learning-Based WRRF Control Optimization
  • Browse
  • Compilations
    • Compilations list
  • Subscriptions
Tools

Related contents

Loading related content

Workflow

No linked records yet

X
  • Current: 2024-09-30 15:38:22 Adam Phillips Continuous release
  • 2024-09-26 15:14:39 Adam Phillips
Description: Access Water
  • Browse
  • Compilations
  • Subscriptions
Log in
0
Accessibility Options

Base text size -

This is a sample piece of body text
Larger
Smaller
  • Shopping basket (0)
  • Accessibility options
  • Return to previous
Description: WEFTEC 2024 PROCEEDINGS
Understanding the Potential for Reinforcement Learning-Based WRRF Control Optimization

Understanding the Potential for Reinforcement Learning-Based WRRF Control Optimization

Understanding the Potential for Reinforcement Learning-Based WRRF Control Optimization

  • New
  • View
  • Details
  • Reader
  • Default
  • Share
  • Email
  • Facebook
  • Twitter
  • LinkedIn
  • New
  • View
  • Default view
  • Reader view
  • Data view
  • Details

This page cannot be printed from here

Please use the dedicated print option from the 'view' drop down menu located in the blue ribbon in the top, right section of the publication.

screenshot of print menu option

Description: WEFTEC 2024 PROCEEDINGS
Understanding the Potential for Reinforcement Learning-Based WRRF Control Optimization
Abstract
Background The most widely employed process control strategies within water resource recovery facilities (WRRFs) are model-based and built on expert knowledge. Examples include ammonia-based aeration control (ABAC) and nitrate-linked internal mixed liquor recycle (IMLR) control. Even where machine learning algorithms are integrated, they are based on supervised learning where the model is trained with the 'correct answer' embedded within the training data. While this approach is often successful, it relies upon high-performing historical control on which to train the model and can be handicapped by human biases toward certain operational modes. Control strategies based on reinforcement learning (RL) are different. In RL, the algorithm (or RL agent) interacts directly with the control environment during training to map optimal actions to observation state data to maximize a specified reward (Figure 1) (Silver et al., 2018; Sutton & Barto, 2018). In this way, the RL agent is entirely data driven and learns from direct experience. Although RL has gained attention as a tool for WRRF control optimization, there remain numerous open questions about the efficacy of employing RL agents to achieve valuable process optimization outcomes in WRRFs (Croll et al., 2023a; Nam et al., 2023). To help address these shortcomings the present study addressed three objectives: 1) evaluate common RL algorithms in the context of WRRF optimization; 2) evaluate the effects of increasing the number of processes controlled by the RL agent; and 3) evaluate the best performing RL agents to better understand how successful RL control strategies compared to domain-based control strategies like ABAC. Methods The present study evaluated RL agent control optimization in the context of the Benchmark Simulation Model No. 1 (BSM1) (Figure 2). The BSM1, a 5-zone bioreactor with a secondary clarifier, IMLR, return activated sludge (RAS), and waste activated sludge (WAS), provides a standard model construction and testing framework to assess and benchmark WRRF control performance through a simulation environment (Alex et al., 2008). It also has predefined dynamic influent and a facility operational cost function. The cost function accounts for direct operational costs, including aeration energy, mixing energy, pumping energy, and biosolids disposal, and indirect environmental costs based on the mass of pollutants discharged to the environment. By defining the BSM1 cost function, with variations, as the RL agent reward function, agents could be rapidly trained to minimize overall facility cost and compared to 'baseline' BSM1 operation. To facilitate training, a novel RL training environment (Croll et al., 2023b) was developed using OpenAI Gym (Brockman et al., 2016) to connect a SUMO (Dynamita) BSM1 simulation to the stable baselines3 (Raffin et al., 2021) package in Python. Four scenarios were evaluated in this study (Table 1). Scenario 1 (S1) tested four common RL algorithms to determine which was best fit for application to WRRF control optimization, representing four common types of RL algorithms: Deep Q Network (DQN), Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), and Twin Delayed Deep Deterministic Policy Gradient (TD3) (Croll et al., 2023b). RL agents in this scenario controlled the dissolved oxygen (DO) set point for Zones 3-5 (Z3-5). Scenarios 2-4 evaluated RL agents from the best algorithm, TD3, for their capacity to control a greater number of actions throughout the BSM1 (Table 1) (Croll et al., 2024). Results Despite only controlling a single action, neither the DQN nor the PPO algorithms were able to produce agents which met the BSM1 effluent limits. Both the A2C and TD3 algorithms produced successful agents (Table 1) (Croll et al., 2023b). However, actions recommended by the A2C agent were not practical for a physical system, as the agent tended to alternate between very low and very high DO setpoints, which would result in excess wear and tear on physical equipment (Figure 3). By contrast, the TD3 agent tended to recommend actions that looked very similar to ABAC control (Figure 3). Despite the success of the TD3 agents in Scenario 1, the TD3 agent only reduced the BSM1 cost by 1.8% over baseline control, likely due to the limited scope of control that the RL agent had over the BSM1 simulation. Increased scope of RL control was evaluated in Scenarios 2-4. As the number of actions under RL agent control increased, so did the level of BSM1 cost reduction, rising to 8.1% in Scenario 3 and 11.5% under Scenario 4 (Croll et al., 2024). However, it should be noted that under Scenario 4, with the addition of WAS control to the RL agent scope, the effluent was not compliant with BSM1 limits. Rather, the RL agent determined correctly that the single largest contributor to the BSM1 operational cost was biosolids disposal and effectively ceased all sludge wasting, opting instead to pay for the defined costs for environmental pollution. This finding highlights the importance of developing reward functions that accurately reflect operational goals and regulatory limits, and suggests that the BSM1 cost function may have undervalued effluent pollution. A breakdown of RL agent actions and key process parameters during RL agent operation under Scenarios 3 and 4 are shown in Figures 4-7. These will be discussed in more detail in the final paper.
The present study evaluated reinforcement learning (RL) agent control optimization in the context of the Benchmark Simulation Model No. 1 (BSM1). RL agents achieved minimal improvement when controlling a single action, but were able to achieve dramatic operational cost reduction when controlling a larger action space. An RL agent successfully maintained effluent limit compliance while controlling seven unique actions for a total facility cost reduction of 8.1% compared to BSM1 baseline.
SpeakerCroll, Henry
Presentation time
08:30:00
08:50:00
Session time
08:30:00
10:00:00
SessionLeveraging Machine Learning for Facility Operations
Session number509
Session locationRoom 253
TopicAdvanced Level, Facility Operations and Maintenance, Intelligent Water, Municipal Wastewater Treatment Design
TopicAdvanced Level, Facility Operations and Maintenance, Intelligent Water, Municipal Wastewater Treatment Design
Author(s)
Croll, Henry, Ikuma, Kaoru, Ong, Say Kee, Sarkar, Soumik
Author(s)H. Croll1, K. Ikuma2, S. Ong2, S. Sarkar2
Author affiliation(s)1Stantec, IA, 2Iowa State University, IA
SourceProceedings of the Water Environment Federation
Document typeConference Paper
PublisherWater Environment Federation
Print publication date Oct 2024
DOI10.2175/193864718825159612
Volume / Issue
Content sourceWEFTEC
Copyright2024
Word count10

Purchase price $11.50

Get access
Log in Purchase content Purchase subscription
You may already have access to this content if you have previously purchased this content or have a subscription.
Need to create an account?

You can purchase access to this content but you might want to consider a subscription for a wide variety of items at a substantial discount!

Purchase access to 'Understanding the Potential for Reinforcement Learning-Based WRRF Control Optimization'

Add to cart
Purchase a subscription to gain access to 18,000+ Proceeding Papers, 25+ Fact Sheets, 20+ Technical Reports, 50+ magazine articles and select Technical Publications' chapters.
Loading items
There are no items to display at the moment.
Something went wrong trying to load these items.
Description: WEFTEC 2024 PROCEEDINGS
Understanding the Potential for Reinforcement Learning-Based WRRF Control Optimization
Pricing
Non-member price: $11.50
Member price:
-10116265
Get access
-10116265
Log in Purchase content Purchase subscription
You may already have access to this content if you have previously purchased this content or have a subscription.
Need to create an account?

You can purchase access to this content but you might want to consider a subscription for a wide variety of items at a substantial discount!

Purchase access to 'Understanding the Potential for Reinforcement Learning-Based WRRF Control Optimization'

Add to cart
Purchase a subscription to gain access to 18,000+ Proceeding Papers, 25+ Fact Sheets, 20+ Technical Reports, 50+ magazine articles and select Technical Publications' chapters.

Details

Description: WEFTEC 2024 PROCEEDINGS
Understanding the Potential for Reinforcement Learning-Based WRRF Control Optimization
Abstract
Background The most widely employed process control strategies within water resource recovery facilities (WRRFs) are model-based and built on expert knowledge. Examples include ammonia-based aeration control (ABAC) and nitrate-linked internal mixed liquor recycle (IMLR) control. Even where machine learning algorithms are integrated, they are based on supervised learning where the model is trained with the 'correct answer' embedded within the training data. While this approach is often successful, it relies upon high-performing historical control on which to train the model and can be handicapped by human biases toward certain operational modes. Control strategies based on reinforcement learning (RL) are different. In RL, the algorithm (or RL agent) interacts directly with the control environment during training to map optimal actions to observation state data to maximize a specified reward (Figure 1) (Silver et al., 2018; Sutton & Barto, 2018). In this way, the RL agent is entirely data driven and learns from direct experience. Although RL has gained attention as a tool for WRRF control optimization, there remain numerous open questions about the efficacy of employing RL agents to achieve valuable process optimization outcomes in WRRFs (Croll et al., 2023a; Nam et al., 2023). To help address these shortcomings the present study addressed three objectives: 1) evaluate common RL algorithms in the context of WRRF optimization; 2) evaluate the effects of increasing the number of processes controlled by the RL agent; and 3) evaluate the best performing RL agents to better understand how successful RL control strategies compared to domain-based control strategies like ABAC. Methods The present study evaluated RL agent control optimization in the context of the Benchmark Simulation Model No. 1 (BSM1) (Figure 2). The BSM1, a 5-zone bioreactor with a secondary clarifier, IMLR, return activated sludge (RAS), and waste activated sludge (WAS), provides a standard model construction and testing framework to assess and benchmark WRRF control performance through a simulation environment (Alex et al., 2008). It also has predefined dynamic influent and a facility operational cost function. The cost function accounts for direct operational costs, including aeration energy, mixing energy, pumping energy, and biosolids disposal, and indirect environmental costs based on the mass of pollutants discharged to the environment. By defining the BSM1 cost function, with variations, as the RL agent reward function, agents could be rapidly trained to minimize overall facility cost and compared to 'baseline' BSM1 operation. To facilitate training, a novel RL training environment (Croll et al., 2023b) was developed using OpenAI Gym (Brockman et al., 2016) to connect a SUMO (Dynamita) BSM1 simulation to the stable baselines3 (Raffin et al., 2021) package in Python. Four scenarios were evaluated in this study (Table 1). Scenario 1 (S1) tested four common RL algorithms to determine which was best fit for application to WRRF control optimization, representing four common types of RL algorithms: Deep Q Network (DQN), Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), and Twin Delayed Deep Deterministic Policy Gradient (TD3) (Croll et al., 2023b). RL agents in this scenario controlled the dissolved oxygen (DO) set point for Zones 3-5 (Z3-5). Scenarios 2-4 evaluated RL agents from the best algorithm, TD3, for their capacity to control a greater number of actions throughout the BSM1 (Table 1) (Croll et al., 2024). Results Despite only controlling a single action, neither the DQN nor the PPO algorithms were able to produce agents which met the BSM1 effluent limits. Both the A2C and TD3 algorithms produced successful agents (Table 1) (Croll et al., 2023b). However, actions recommended by the A2C agent were not practical for a physical system, as the agent tended to alternate between very low and very high DO setpoints, which would result in excess wear and tear on physical equipment (Figure 3). By contrast, the TD3 agent tended to recommend actions that looked very similar to ABAC control (Figure 3). Despite the success of the TD3 agents in Scenario 1, the TD3 agent only reduced the BSM1 cost by 1.8% over baseline control, likely due to the limited scope of control that the RL agent had over the BSM1 simulation. Increased scope of RL control was evaluated in Scenarios 2-4. As the number of actions under RL agent control increased, so did the level of BSM1 cost reduction, rising to 8.1% in Scenario 3 and 11.5% under Scenario 4 (Croll et al., 2024). However, it should be noted that under Scenario 4, with the addition of WAS control to the RL agent scope, the effluent was not compliant with BSM1 limits. Rather, the RL agent determined correctly that the single largest contributor to the BSM1 operational cost was biosolids disposal and effectively ceased all sludge wasting, opting instead to pay for the defined costs for environmental pollution. This finding highlights the importance of developing reward functions that accurately reflect operational goals and regulatory limits, and suggests that the BSM1 cost function may have undervalued effluent pollution. A breakdown of RL agent actions and key process parameters during RL agent operation under Scenarios 3 and 4 are shown in Figures 4-7. These will be discussed in more detail in the final paper.
The present study evaluated reinforcement learning (RL) agent control optimization in the context of the Benchmark Simulation Model No. 1 (BSM1). RL agents achieved minimal improvement when controlling a single action, but were able to achieve dramatic operational cost reduction when controlling a larger action space. An RL agent successfully maintained effluent limit compliance while controlling seven unique actions for a total facility cost reduction of 8.1% compared to BSM1 baseline.
SpeakerCroll, Henry
Presentation time
08:30:00
08:50:00
Session time
08:30:00
10:00:00
SessionLeveraging Machine Learning for Facility Operations
Session number509
Session locationRoom 253
TopicAdvanced Level, Facility Operations and Maintenance, Intelligent Water, Municipal Wastewater Treatment Design
TopicAdvanced Level, Facility Operations and Maintenance, Intelligent Water, Municipal Wastewater Treatment Design
Author(s)
Croll, Henry, Ikuma, Kaoru, Ong, Say Kee, Sarkar, Soumik
Author(s)H. Croll1, K. Ikuma2, S. Ong2, S. Sarkar2
Author affiliation(s)1Stantec, IA, 2Iowa State University, IA
SourceProceedings of the Water Environment Federation
Document typeConference Paper
PublisherWater Environment Federation
Print publication date Oct 2024
DOI10.2175/193864718825159612
Volume / Issue
Content sourceWEFTEC
Copyright2024
Word count10

Actions, changes & tasks

Outstanding Actions

Add action for paragraph

Current Changes

Add signficant change

Current Tasks

Add risk task

Connect with us

Follow us on Facebook
Follow us on Twitter
Connect to us on LinkedIn
Subscribe on YouTube
Powered by Librios Ltd
Powered by Librios Ltd
Authors
Terms of Use
Policies
Help
Accessibility
Contact us
Copyright © 2024 by the Water Environment Federation
Loading items
There are no items to display at the moment.
Something went wrong trying to load these items.
Description: WWTF Digital Boot 180x150
WWTF Digital (180x150)
Created on Jul 02
Websitehttps:/­/­www.wef.org/­wwtf?utm_medium=WWTF&utm_source=AccessWater&utm_campaign=WWTF
180x150
Croll, Henry. Understanding the Potential for Reinforcement Learning-Based WRRF Control Optimization. Water Environment Federation, 2024. Web. 25 May. 2025. <https://www.accesswater.org?id=-10116265CITANCHOR>.
Croll, Henry. Understanding the Potential for Reinforcement Learning-Based WRRF Control Optimization. Water Environment Federation, 2024. Accessed May 25, 2025. https://www.accesswater.org/?id=-10116265CITANCHOR.
Croll, Henry
Understanding the Potential for Reinforcement Learning-Based WRRF Control Optimization
Access Water
Water Environment Federation
October 9, 2024
May 25, 2025
https://www.accesswater.org/?id=-10116265CITANCHOR