1 of 15

COMPARING THE PERFORMANCE OF A PID CONTROLLER WITH THAT OF A THERMOSTAT.

2 of 15

GOAL

To compare the control of the indoor temperature of a room between a PID controller and our DDPG Reinforcement Learning.

3 of 15

QUESTIONS

  • How does the control of the RL model compare with that of a PID controller and a thermostat.
  • Does a particular device manage cost of heat used better than the others?

4 of 15

DEEP DETERMINISTIC POLICY GRADIENT

The deep deterministic policy gradient (DDPG) algorithm is a model-free, online, off-policy reinforcement learning method. A DDPG agent is an actor-critic reinforcement learning agent that computes an optimal policy that maximizes the long-term reward.

During training, a DDPG agent:

  • Updates the actor and critic properties at each time step during learning.
  • Stores past experience using a circular experience buffer. The agent updates the actor and critic using a mini-batch of experiences randomly sampled from the buffer.
  • Perturbs the action chosen by the policy using a stochastic noise model at each training step.

5 of 15

PID TEMPERATURE CONTROLLER

A PID temperature controller, as its name implies, is an instrument used to control temperature, mainly without extensive operator involvement. A PID controller in a temperature control system will accept a temperature sensor such as a thermocouple or RD as input and compare the actual temperature to the desired control temperature or setpoint. It will then provide an output to a control element.

6 of 15

SIMULINK MODELS

7 of 15

Controller

8 of 15

THERMOSTAT TEMPERATURE CONTROLLER

9 of 15

THERMOSTAT RESULTS

10 of 15

PID TEMPERATURE CONTROLLER

11 of 15

PID RESULTS

12 of 15

DDPG REINFORCEMENT LEARNING TEMPERATURE CONTROLLER

13 of 15

RL RESULTS

14 of 15

TRAINING PROCESS

15 of 15

CONCLUSIONS

  • The thermostat does not do very well maintaining a constant temperature. This is seen from the jagged lines during the control process.
  • The PID Controller’s control process is incredibly smooth and close to set point, apart from the initial overshoot.
  • The RL model’s control process is pretty smooth too, it however has a greater error from the set point as compared to the PID controller.
  • The cost from the RL model is surprisingly lesser(marginally) as compared to the PID and the thermostat.