1 of 10

UCLA SOFIA Lab, Mechanical and Aerospace Engineering

Ryan Teoh, Zhecheng Liu, Jeff Eldredge

Deep Reinforcement Learning Control of

an Oscillating Hydrofoil to

Maximize Power Extraction

Ref: H.R. Karbasian, J.A. Esfahani, E. Barati, The power extraction by flapping foil hydrokinetic turbine in swing arm mode, Renewable Energy, Volume 88, 2016

2 of 10

Motivation (Energy Extraction)

2

Ref: Paul Breeze, Chapter 14 - Marine Power Generation Technologies, Power Generation Technologies (Third Edition), 2019, Pages 323-349,

DRIVING QUESTION:

How can we extract power from ocean wave currents?

PHYSICAL SYSTEM:

  • Oscillating hydrofoils used to extract wave energy from water flows
  • Converts lift generated during heaving (up-down) and pitching (rotational) motions into mechanical power

GOAL:

Want to find optimal kinematics to optimize power extraction

3 of 10

Problem Statement

3

 

 

 

 

 

 

 

OVERALL APPROACH:

Experiential (Reinforcement Learning) method to learn optimal sequence of actions, specifically pitching actions given pre-set heaving actions

RL CHALLENGES AND SOLUTIONS:

  • RL requires extensive interaction with environment, leading to long training times and high computational costs
  • Use a reduced-order model of the flow environment to accelerate training and reduce computational demand

4 of 10

Model Architecture

4

 

[2] Kai Fukami and Kunihiko Taira. Grasping extreme aerodynamics on a low-dimensional manifold.

Nature Communications, 14(1):6480, 2023.

Figure 2: Physics-augmented autoencoder schematic structure (adapted from Fukami and Taira [2]).

5 of 10

Model Architecture (Continued)

5

Encoder

Decoder

Truth

PA-AE

Start Angle, Frequency

LDM

Stacked

LSTM

 

Decoder

Action

Encoder

Decoder

PA-AE

LDM

6 of 10

Training Data Collection

6

 

TRAINING RESOURCE:

  • NVIDA A100 GPU card
  • 48 hours

7 of 10

Autoencoder reconstruction

7

TRAINING DETAILS:

  • 2000 epochs
  • 225x300 vorticity field compressed to 3 latent variables
  • 100 snapshots of vorticity and Cp for each of 950 episodes
  • Solid shapes represent CFD while hollow show AE reconstruction
  • Solid Lines represent CFD Cp while dotted show AE prediction

8 of 10

LSTM latent variable trajectories, reconstruction

8

TRAINING DETAILS:

  • 3000 epochs
  • Solid Lines represent true Cp and latent variables while dotted show LSTM prediction
  • Latent variable, Cp and action sequence, frequency and starting angle fed as input
  • Outputs predicted latent variable, Cp and action sequence

9 of 10

Reinforcement learning agent Cp

9

 

if truncated

if not truncated

Reward:

 

 

 

 

10 of 10

Conclusions

10

ACKNOWLEDGEMENTS:

I would like to thank UC Leads for their support throughout my work, as well as my advisors, Zhecheng Liu and Jeff Eldredge.

NEXT STEPS:

  • Train RL agent on more periods
  • Compare agent with sinusoidal motion
  • Tune RL agent and test on cases outside training range

REFERENCES:

Zhecheng Liu, Diederik Beckers, & Jeff D. Eldredge. (2025). Model-Based Reinforcement Learning for Control of Strongly-Disturbed Unsteady Aerodynamic Flows, AIAA J..

Diederik Beckers, & Jeff D. Eldredge. (2024). Deep reinforcement learning of airfoil pitch control in a highly disturbed environment using partial observations, PRF.

  • Shown that we can represent flow fields in a low dimensional model
  • Can perform predictions on how flows will develop given certain actions to a high degree of accuracy in latent space
  • Able to train RL agent on latent space and dramatically increase training speed and reduce computational requirements
  • Able to quickly tune agent without need to restart complete training process (environmental model stays constant)