SAC
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Introductions and Notations
Maximum entropy reinforcement learning optimizes policies to maximize both the expected return and the expected entropy of the policy.