Hi, someone could give me an intuition about this passage:

"Note that although there could be more than one optimal policy for a given
MDP, there can only be one optimal state-value function and optimal action-value
function." ?

386771 wrote:Hi, someone could give me an intuition about this passage:

"Note that although there could be more than one optimal policy for a given
MDP, there can only be one optimal state-value function and optimal action-value
function." ?

Thank you

Value functions are number-based. For example, going up is valued at 2.33, down at 1.25, left and right are equally valued at 4.56. That is accurate and unique. However, if using this value function you try to obtain a policy you can see how left and right are equally optimal. So, a policy going left or a policy going right will be both considered optimal policies.