1 of 10

UCLA SOFIA Lab, Mechanical and Aerospace Engineering

Ryan Teoh, Zhecheng Liu, Jeff Eldredge

Deep Reinforcement Learning Control of

an Oscillating Hydrofoil to

Maximize Power Extraction

Ref: H.R. Karbasian, J.A. Esfahani, E. Barati, The power extraction by flapping foil hydrokinetic turbine in swing arm mode, Renewable Energy, Volume 88, 2016

2 of 10

Motivation (Energy Extraction)

2

Ref: Paul Breeze, Chapter 14 - Marine Power Generation Technologies, Power Generation Technologies (Third Edition), 2019, Pages 323-349,

DRIVING QUESTION:

How can we extract power from ocean wave currents?

PHYSICAL SYSTEM:

Oscillating hydrofoils used to extract wave energy from water flows
Converts lift generated during heaving (up-down) and pitching (rotational) motions into mechanical power

GOAL:

Want to find optimal kinematics to optimize power extraction

3 of 10

Problem Statement

3

OVERALL APPROACH:

Experiential (Reinforcement Learning) method to learn optimal sequence of actions, specifically pitching actions given pre-set heaving actions

RL CHALLENGES AND SOLUTIONS:

RL requires extensive interaction with environment, leading to long training times and high computational costs
Use a reduced-order model of the flow environment to accelerate training and reduce computational demand

4 of 10

Model Architecture

4

[2] Kai Fukami and Kunihiko Taira. Grasping extreme aerodynamics on a low-dimensional manifold.

Nature Communications, 14(1):6480, 2023.

Figure 2: Physics-augmented autoencoder schematic structure (adapted from Fukami and Taira [2]).

5 of 10

Model Architecture (Continued)

5

Encoder

Decoder

Truth

PA-AE

Start Angle, Frequency

LDM

Stacked

LSTM

Decoder

Action

Encoder

Decoder

PA-AE

LDM

6 of 10

Training Data Collection

6

TRAINING RESOURCE:

NVIDA A100 GPU card
48 hours

7 of 10

Autoencoder reconstruction

7

TRAINING DETAILS:

2000 epochs
225x300 vorticity field compressed to 3 latent variables
100 snapshots of vorticity and Cp for each of 950 episodes
Solid shapes represent CFD while hollow show AE reconstruction
Solid Lines represent CFD Cp while dotted show AE prediction

8 of 10

LSTM latent variable trajectories, reconstruction

8

TRAINING DETAILS:

3000 epochs
Solid Lines represent true Cp and latent variables while dotted show LSTM prediction
Latent variable, Cp and action sequence, frequency and starting angle fed as input
Outputs predicted latent variable, Cp and action sequence

9 of 10

Reinforcement learning agent Cp

9

if truncated

if not truncated

Reward:

10 of 10

Conclusions

10

ACKNOWLEDGEMENTS:

I would like to thank UC Leads for their support throughout my work, as well as my advisors, Zhecheng Liu and Jeff Eldredge.

NEXT STEPS:

Train RL agent on more periods
Compare agent with sinusoidal motion
Tune RL agent and test on cases outside training range

REFERENCES:

Zhecheng Liu, Diederik Beckers, & Jeff D. Eldredge. (2025). Model-Based Reinforcement Learning for Control of Strongly-Disturbed Unsteady Aerodynamic Flows, AIAA J..

Diederik Beckers, & Jeff D. Eldredge. (2024). Deep reinforcement learning of airfoil pitch control in a highly disturbed environment using partial observations, PRF.

Shown that we can represent flow fields in a low dimensional model
Can perform predictions on how flows will develop given certain actions to a high degree of accuracy in latent space
Able to train RL agent on latent space and dramatically increase training speed and reduce computational requirements
Able to quickly tune agent without need to restart complete training process (environmental model stays constant)