CMPT 419/983 - Theoretical Foundations of Reinforcement Learning (Fall 2023)

Lectures (beginning Sep 8): Friday (2.30 pm - 5.20 pm) (AQ 3150)

Instructor: Sharan Vaswani

TA: Michael Lu

Course Objective: Numerous applications in machine learning involve interacting with the world, collecting data, and reasoning about it, all with incomplete information. Reinforcement Learning (RL) is a general framework for interactive learning and has been used for applications in clinical trials, monitoring industrial plants, robotics, games such as Atari and Go, and computational marketing. This course introduces the foundational concepts in bandits and RL from a rigorous theoretical perspective. The course intends to give students experience in: (1) Proving theoretical guarantees for reinforcement learning algorithms (2) Mapping problems in practical applications (e.g. recommender systems, social networks) to the RL framework (3) Developing and analyzing new bandit and RL algorithms

Prerequisites: Probability, Linear Algebra, Multivariable Calculus, Undergraduate Machine Learning

Textbook: There is no required textbook. We will use the following resources:

Grading: Assignments 48%, Project 50%, Participation 2%

Piazza for course-related questions and discussions.

List of topics

Schedule

Date Topics Slides References Homework
Friday Sep 8 Course logistics, Multi-armed Bandits, Explore-then-Commit [L1] Chapters 4, 5, 6 of [LS20], [Refresher on Probability]
Friday Sep 15 UCB, Linear Bandits, LinUCB [L2] Chapters 7, 19 of [LS20], [Refresher on Linear Algebra] [List of inequalities] Assignment 1 released
Friday Sep 22 LinUCB, Markov Decision Processes [L3] Chapter 20 of [LS20], Chapter 2, 5.1 - 5.2 of [PC23]
Friday Sep 29 Bellman Operators, Fundamental Theorem, Value Iteration [L4] Chapter 5.2-5.5 of [PC23], [Lecture 2, Csaba notes] [Lecture 3, Csaba notes]
Friday Oct 6 Policy Iteration, Linear Programming and MDPs [L5] [Lecture 4, Csaba notes], Chapter 5.6, 5.8 of [PC23] Assignment 1 due, Assignment 2 released
Friday Oct 13 Monte-Carlo for Policy Evaluation, Temporal Difference Learning [L6] Chapter 5,6 of [SB20] [A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation]
Friday Oct 20 Temporal Difference Learning, Approximate Policy Iteration [L7] [Lecture 8, Csaba notes] Project Proposal due
Friday Oct 27 Approximate Policy Iteration, Politex, Mirror Descent [L8] [Lecture 13, Csaba notes] [Lecture 14, Csaba notes] Assignment 2 due, Assignment 3 released
Friday Nov 3 Mirror Descent, Policy Gradient [L9] Chapter 4 of [Convex Optimization: Algorithms and Complexity]
Friday Nov 10 Softmax Policy Gradient, Natural Policy Gradient [L10] [On the Global Convergence Rates of Softmax Policy Gradient Methods] [A Natural Policy Gradient]
Friday Nov 17 Natural Policy Gradient, Handling stochasticity [L11] [Understanding the Effect of Stochasticity in Policy Optimization] Assignment 3 due, Assignment 4 released
Friday Nov 24 TRPO/PPO, Exploration in Tabular MDPs, Wrapping up [L12] [Trust Region Policy Optimization], Chapter 7 of [AJKS22]
Friday Dec 1 Project Presentations
Friday Dec 8 Assignment 4 due
Friday Dec 15 Final Project Report due

Related Courses