Dynamic Programming
Dynamic Programming
We have defined concepts and properties such as value functions, bellman equations, bellman operators and etc. The question is how we can find the the optimal policy? Before we start, we assume that the dynamics of the MDP is given, that is, we know our transition distribution
DP methods benefit from the structure of the MDP such as the recursive structure encoded in the Bellman equation, in order to compute the value function.
Policy Evaluation
Policy Evaluation is the problem of computing the value function of a given policy
Given an MDP (
Control
Control is the problem of finding the optimal value function