1 of 18

Neural Pontryagin Optimal Controller for

Lossy Energy Storage with Nonlinear Efficiency

��

Chengyang Gu, HKUST (Guangzhou)

Yize Chen, University of Alberta

yize9@ualberta.ca

2 of 18

Outline

Background and Motivation

Introduction

Method

Simulation Results

Conclusion and Future Works

2

3 of 18

Motivation

Rapid growth of grid-scale energy storage

Found applications in mitigating load and renewable fluctuations, reducing price volatility, enhancing resilience under extreme weather

Challenge: Nonlinear, unknown battery efficiency

Battery’s distinct charging/discharging curves, maybe unknown to operators

3

4 of 18

Motivation-Battery Control Challenge

Rapid growth of grid-scale energy storage

Found applications in mitigating load and renewable fluctuations, reducing price volatility, enhancing resilience under extreme weather

Challenge: Nonlinear, unknown battery efficiency

Battery’s distinct charging/discharging curves, maybe unknown to operators

4

5 of 18

Motivation-Current Methods’ Challenge

Dynamic Programming / Model Predictive Control (MPC):

struggle with nonlinear, unknown dynamics

Model-Free RL:

Strong performance, but poor sample efficiency
Limited interpretability, safety concerns

Model-Based RL:

Better sample efficiency
Still lacks tractable mechanism for optimality

5

6 of 18

Introduction-Overall Framework

In this work, we propose a novel Framework: Neural-PMP integrates Pontryagin Maximum Principle with neural network–learned dynamics�

New Algorithm: Gradient-based method to solve PMP conditions efficiently�

Improved Performance:

Higher sample efficiency than model-free RL
Safer solutions with fewer constraint violations
Outperforms linearized MPC and Random Shooting-MPC
Extends naturally to multi-battery systems

6

7 of 18

Introduction

7

8 of 18

Problem Formulation

Battery arbitrage as an optimal control problem: Objectives

(1) Charging cost

(2) Penalty for excessive (dis)charging amount ut:

(3) Penalty term as soft constraint to prevent from exceeding battery limits

Battery arbitrage constraints on state and controls

8

9 of 18

Problem Formulation: Nonlinear Efficiency

Real batteries incur efficiency losses:�

Efficiency decreases at higher charging rates�

Efficiency is nonlinear and battery-specific due to electrochemical properties:�

Examples: sigmoid, piecewise linear, quadratic forms

Such battery’s charging efficiency function is not known explicitly by the users or controller, justifying the adoption of using neural network to approximate such dynamics

9

10 of 18

PMP Conditions

In classical optimal control theory, PMP conditions provide the necessary conditions for finding the optimal control actions:

With H() as the Hamiltonian, λ as the costate. The optimality conditions are:

10

11 of 18

PMP with Neural Dynamics: Learning

Key step: Train NN-parameterized battery dynamics, and plug it into PMP conditions (Line 5 in Algorithm)

11

This is achievable because battery measurements are always available!

Battery Measurements

NN Surrogate Dynamics

PMP Conditions for Optimal Control

Optimal Control Sequences

12 of 18

PMP with Neural Dynamics: Control

Once dynamical model is learned, can do gradient step to iteratively optimize u

12

13 of 18

Neural PMP’s Properties

13

14 of 18

Simulation Setups

Benchmarked Algorithms:

i). a Linear-Convex solver, which firstly use linear regression to approximate a linearized system dynamics and then solve MPC problem using cvxpy solver;

ii). state-of-the-art model-free RL algorithm Proximal Policy Optimization (PPO);

iii). model-based random shooting MPC (RS-MPC) controller

We test on both one-battery and multiple-battery setting.

14

15 of 18

Simulation Results-Single Battery

Observations:

Neural-PMP is the only profitable controller
Superior sample efficiency
Zero constraint violations�

15

16 of 18

Simulation Results-Multiple Batteries

Neural-PMP extends naturally to multi-battery case;

All batteries reach target SOC;�
Profit maximized under price variations;�
Scales well with more battery units.

�

16

17 of 18

Conclusions and Future Works

Neural-PMP integrates PMP with NN dynamics for optimal battery control

Advantages:

Sample-efficient
Safer and interpretable solutions
Outperforms state-of-the-art MPC, PPO, RS-MPC�

Future work:

Effect of NN modeling errors
Policy initialization strategies
Theoretical guarantees under uncertainty

17

18 of 18

Thank you!

Questions? yize.chen@ualberta.ca

https://github.com/ChengyangGU/NeuralPMP2024

18