Dashboard
Controlling multiple arms
Keeping the double-jointed arm hand in the green sphere

In this project, I solved a continuous control problem using the algorithms Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO). I worked with the Reacher environment, from Unity ML Toolkit, that contains 20 identical agents, each with its own copy of the environment. The goal of the agent is to maintain its position at the target location for as many time steps as possible.

This project is connected to the Deep Reinforcement Learning Nanodegree, from Udacity. I used Python, Unity ML toolkit and Pytorch.

VIEW PROJECT