LO 2.3.1.B

Learning Objective: Construct a task as an RL problem.

Review:

     

EXAMPLES OF TASKS

Determining the Placement of Ads on a Web Page

Creating A Personalized Learning System

Controlling A Walking Robot

Agent

The program makes decisions on how many ads are appropriate for a page.

The program decides what to show next in an online learning catalog.

The program controls a walking robot.

Environment

The web page.

The learning system.

The real world.

Action

One of three:

(1) putting another ad on the page; (2) dropping an ad from the page; (3) neither adding nor removing.

Playing a new class video and an advertisement.

One out of four moves:

 (1) forward; (2) backward; (3) left; and (4) right.

Reward

Positive when revenue increases; negative when revenue drops.

Positive if the user chooses to click the class video presented; greater positive reward if the user chooses to click the advertisement; negative if the user goes away.

Positive when it approaches the target destination; negative when it wastes time, goes in the wrong direction or falls down.

Notes

In this scheme, the agent examines the environment and gets its current status. The status can be how many advertisements on the web page exist and whether there is room for more.

The agent then decides which of the three measures should be taken at each step. If it is programmed to receive positive rewards when revenue increases and negative rewards when revenue falls, it can develop its effective strategy.

This program can make a personalized class system valuable. Users can benefit from more effective learning and the system from more effective advertising.

Here, a robot can teach itself to maneuver more effectively by adapting its acton policy based on the rewards it receives.

Source: Assigned reading