RL learning�
The most important feature distinguishing reinforcement learning from other types of learning is that it uses training information that evaluates the actions taken rather than instructs by giving correct actions.
• Evaluates actions rather instructs by giving correct actions
• Pure evaluative feedback depends totally on the action taken.
Pure instructive feedback depends not at all on the action taken.
• Evaluative feedback indicates how good the action is, but not if it is the best or worst action possible.
• Supervised learning is instructive; optimization is evaluative.
Evaluative feedback depends entirely on the action taken, whereas instructive feedback is independent of the action taken.
Upper Confidence Bounds(UCB)�
UCB = estimated mean reward + exploration bonus
where;
Thompson Sampling(The Bernoulli bandit)