1 of 18

Neural Pontryagin Optimal Controller for

Lossy Energy Storage with Nonlinear Efficiency

���

Chengyang Gu, HKUST (Guangzhou)

Yize Chen, University of Alberta

yize9@ualberta.ca

2 of 18

Outline

Background and Motivation

Introduction

Method

Simulation Results

Conclusion and Future Works

2

3 of 18

Motivation

  • Rapid growth of grid-scale energy storage

  • Found applications in mitigating load and renewable fluctuations, reducing price volatility, enhancing resilience under extreme weather

  • Challenge: Nonlinear, unknown battery efficiency
    • Battery’s distinct charging/discharging curves, maybe unknown to operators

3

4 of 18

Motivation-Battery Control Challenge

  • Rapid growth of grid-scale energy storage

  • Found applications in mitigating load and renewable fluctuations, reducing price volatility, enhancing resilience under extreme weather

  • Challenge: Nonlinear, unknown battery efficiency
    • Battery’s distinct charging/discharging curves, maybe unknown to operators

4

5 of 18

Motivation-Current Methods’ Challenge

Dynamic Programming / Model Predictive Control (MPC):

  • struggle with nonlinear, unknown dynamics

Model-Free RL:

  • Strong performance, but poor sample efficiency
  • Limited interpretability, safety concerns

Model-Based RL:

  • Better sample efficiency
  • Still lacks tractable mechanism for optimality

5

6 of 18

Introduction-Overall Framework

In this work, we propose a novel Framework: Neural-PMP integrates Pontryagin Maximum Principle with neural network–learned dynamics�

New Algorithm: Gradient-based method to solve PMP conditions efficiently�

Improved Performance:

  • Higher sample efficiency than model-free RL
  • Safer solutions with fewer constraint violations
  • Outperforms linearized MPC and Random Shooting-MPC
  • Extends naturally to multi-battery systems

6

7 of 18

Introduction

7

8 of 18

Problem Formulation

Battery arbitrage as an optimal control problem: Objectives

(1) Charging cost

(2) Penalty for excessive (dis)charging amount ut:

(3) Penalty term as soft constraint to prevent from exceeding battery limits

Battery arbitrage constraints on state and controls

8

9 of 18

Problem Formulation: Nonlinear Efficiency

Real batteries incur efficiency losses:�

  • Efficiency decreases at higher charging rates�

Efficiency is nonlinear and battery-specific due to electrochemical properties:�

Examples: sigmoid, piecewise linear, quadratic forms

Such battery’s charging efficiency function is not known explicitly by the users or controller, justifying the adoption of using neural network to approximate such dynamics

9

10 of 18

PMP Conditions

In classical optimal control theory, PMP conditions provide the necessary conditions for finding the optimal control actions:

With H() as the Hamiltonian, λ as the costate. The optimality conditions are:

10

11 of 18

PMP with Neural Dynamics: Learning

Key step: Train NN-parameterized battery dynamics, and plug it into PMP conditions (Line 5 in Algorithm)

11

This is achievable because battery measurements are always available!

Battery Measurements

NN Surrogate Dynamics

PMP Conditions for Optimal Control

Optimal Control Sequences

12 of 18

PMP with Neural Dynamics: Control

Once dynamical model is learned, can do gradient step to iteratively optimize u

12

13 of 18

Neural PMP’s Properties

13

14 of 18

Simulation Setups

Benchmarked Algorithms:

i). a Linear-Convex solver, which firstly use linear regression to approximate a linearized system dynamics and then solve MPC problem using cvxpy solver;

ii). state-of-the-art model-free RL algorithm Proximal Policy Optimization (PPO);

iii). model-based random shooting MPC (RS-MPC) controller

We test on both one-battery and multiple-battery setting.

14

15 of 18

Simulation Results-Single Battery

Observations:

  • Neural-PMP is the only profitable controller
  • Superior sample efficiency
  • Zero constraint violations�

15

16 of 18

Simulation Results-Multiple Batteries

  • Neural-PMP extends naturally to multi-battery case;

  • All batteries reach target SOC;�
  • Profit maximized under price variations;�
  • Scales well with more battery units.

16

17 of 18

Conclusions and Future Works

Neural-PMP integrates PMP with NN dynamics for optimal battery control

Advantages:

  • Sample-efficient
  • Safer and interpretable solutions
  • Outperforms state-of-the-art MPC, PPO, RS-MPC�

Future work:

  • Effect of NN modeling errors
  • Policy initialization strategies
  • Theoretical guarantees under uncertainty

17

18 of 18

Thank you!

18