When I first started studying reinforcement learning (RL), I implemented Proximal Policy Optimization (PPO) from scratch using only the psuedocode on OpenAI’s website. It didn’t work and failed to obtain nearly any reward on most OpenAI Gym environments.