Normalize the state and action space as well as the reward is a good practice Visualise as much as possible to get an intuition about the method as possible bugs If ...