LO 2.3.1.B

Learning Objective: Construct a task as an RL problem.

Review:

Tasks for Reinforcing Learning Problems

	EXAMPLES OF TASKS
	Determining the Placement of Ads on a Web Page	Creating A Personalized Learning System	Controlling A Walking Robot
Agent	The program makes decisions on how many ads are appropriate for a page.	The program decides what to show next in an online learning catalog.	The program controls a walking robot.
Environment	The web page.	The learning system.	The real world.
Action	One of three: (1) putting another ad on the page; (2) dropping an ad from the page; (3) neither adding nor removing.	Playing a new class video and an advertisement.	One out of four moves: (1) forward; (2) backward; (3) left; and (4) right.
Reward	Positive when revenue increases; negative when revenue drops.	Positive if the user chooses to click the class video presented; greater positive reward if the user chooses to click the advertisement; negative if the user goes away.	Positive when it approaches the target destination; negative when it wastes time, goes in the wrong direction or falls down.
Notes	In this scheme, the agent examines the environment and gets its current status. The status can be how many advertisements on the web page exist and whether there is room for more. The agent then decides which of the three measures should be taken at each step. If it is programmed to receive positive rewards when revenue increases and negative rewards when revenue falls, it can develop its effective strategy.	This program can make a personalized class system valuable. Users can benefit from more effective learning and the system from more effective advertising.	Here, a robot can teach itself to maneuver more effectively by adapting its acton policy based on the rewards it receives.

Source: Assigned reading