SAC

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Introductions and Notations

Maximum entropy reinforcement learning optimizes policies to maximize both the expected return and the expected entropy of the policy.