CMPT 419/983 - Theoretical Foundations of Reinforcement Learning (Fall 2023)

Lectures (beginning Sep 8): Friday (2.30 pm - 5.20 pm) (AQ 3150)

Course Objective: Numerous applications in machine learning involve interacting with the world, collecting data, and reasoning about it, all with incomplete information. Reinforcement Learning (RL) is a general framework for interactive learning and has been used for applications in clinical trials, monitoring industrial plants, robotics, games such as Atari and Go, and computational marketing. This course introduces the foundational concepts in bandits and RL from a rigorous theoretical perspective. The course intends to give students experience in: (1) Proving theoretical guarantees for reinforcement learning algorithms (2) Mapping problems in practical applications (e.g. recommender systems, social networks) to the RL framework (3) Developing and analyzing new bandit and RL algorithms

Prerequisites: Probability, Linear Algebra, Multivariable Calculus, Undergraduate Machine Learning

Textbook: There is no required textbook. We will use the following resources:

Grading: Assignments 48%, Project 50%, Participation 2%

Piazza for course-related questions and discussions.

List of topics

Basics: Concentration inequalities, (Online) Convex Optimization
Bandits: Multi-armed/Contextual bandits framework, Regret minimization, Epsilon-greedy, Upper-confidence Bound
Markov Decision Processes: Structural properties, Bellman equation, Linear programming view of MDPs
MDPs: Algorithms in the tabular setting: Policy Evaluation, Temporal Difference Learning, Value Iteration, Policy Iteration
MDPs: Algorithms with function approximation: Approximate policy iteration, Politex, Policy gradients
MDPS: Regret minimization in the online setting using UCB-VI

Schedule

Date	Topics	Slides	References	Homework
Friday Sep 8	Course logistics, Multi-armed Bandits, Explore-then-Commit	[L1]	Chapters 4, 5, 6 of [LS20], [Refresher on Probability]
Friday Sep 15	UCB, Linear Bandits, LinUCB	[L2]	Chapters 7, 19 of [LS20], [Refresher on Linear Algebra] [List of inequalities]	Assignment 1 released
Friday Sep 22	LinUCB, Markov Decision Processes	[L3]	Chapter 20 of [LS20], Chapter 2, 5.1 - 5.2 of [PC23]
Friday Sep 29	Bellman Operators, Fundamental Theorem, Value Iteration	[L4]	Chapter 5.2-5.5 of [PC23], [Lecture 2, Csaba notes] [Lecture 3, Csaba notes]
Friday Oct 6	Policy Iteration, Linear Programming and MDPs	[L5]	[Lecture 4, Csaba notes], Chapter 5.6, 5.8 of [PC23]	Assignment 1 due, Assignment 2 released
Friday Oct 13	Monte-Carlo for Policy Evaluation, Temporal Difference Learning	[L6]	Chapter 5,6 of [SB20] [A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation]
Friday Oct 20	Temporal Difference Learning, Approximate Policy Iteration	[L7]	[Lecture 8, Csaba notes]	Project Proposal due
Friday Oct 27	Approximate Policy Iteration, Politex, Mirror Descent	[L8]	[Lecture 13, Csaba notes] [Lecture 14, Csaba notes]	Assignment 2 due, Assignment 3 released
Friday Nov 3	Mirror Descent, Policy Gradient	[L9]	Chapter 4 of [Convex Optimization: Algorithms and Complexity]
Friday Nov 10	Softmax Policy Gradient, Natural Policy Gradient	[L10]	[On the Global Convergence Rates of Softmax Policy Gradient Methods] [A Natural Policy Gradient]
Friday Nov 17	Natural Policy Gradient, Handling stochasticity	[L11]	[Understanding the Effect of Stochasticity in Policy Optimization]	Assignment 3 due, Assignment 4 released
Friday Nov 24	TRPO/PPO, Exploration in Tabular MDPs, Wrapping up	[L12]	[Trust Region Policy Optimization], Chapter 7 of [AJKS22]
Friday Dec 1	Project Presentations
Friday Dec 8	Assignment 4 due
Friday Dec 15	Final Project Report due

Related Courses