Description

5/5 - (1 vote)

CSCI-GA 3033-090 Homework – 1, DAgger

Introduction
This homework is designed to follow up on the lecture about Deep Imitation Learning. For this assignment, you will need to know about the imitation learning algorithms we talked about in the class, in particular DAgger., If you have not already, we propose you brush up on the lecture notes.

Code folder
Find the folder with the provided code in the following google drive folder: https://drive.google.com/drive/folders/1T8B3gSNWjQU-JpifHkEDm9FfoxJA6wB_?usp=sharing. Download all the files in the same directory, and then follow the instructions in the env_installation.md file, and then dagger_template.py file.
Submission
Please submit your homework using this google form link: https://forms.gle/c4DSHR8LYzfMPT1SA

Deadline for submission: September 17, 2021, 11:59 PM Easter Time.
Points
● Questions 1 is 10 points.
● Question 2 and 3 are 5 points each.
● Bonus question: 5 points.
● Total: 20 points (max 25 with bonus).

DAgger
In the class, we learned about the DAgger (dataset aggregation) algorithm, which is used to clone an expert policy. This method is quite useful especially when querying the expert is expensive, and thus we want to learn a policy that is almost as good as the expert without the high number of queries to it.

In this homework, we have provided you with an environment that is hard to learn directly. Thankfully, we have access to an expert in this environment. In this homework, your task will be to utilize DAgger to learn a deep neural network policy that performs well on this task.

Environment
The environment we will use in this homework is built upon the Reacher environment from OpenAI gym (https://gym.openai.com/envs/Reacher-v2/). We have provided our environment in the reacher_env.py file in our code directory. It follows the OpenAI gym API, which you can learn more about at https://github.com/openai/gym#api. For this homework, an agent in this environment is considered successful if it can achieve a mean reward of at least 15.0.

In this homework, we will attempt to learn this agent from image observations. Unfortunately, learning this agent directly from images without any priors is incredibly difficult, since images can be from a very high dimensional space. Thankfully, we have access to an expert prediction for any state the environment is currently on, which can be retrieved by the get_expert_action() function call. Note: get_expert_action() does not take any arguments, thus you must be careful to call it right after you have called .reset() or .step() on the environment to get the associated expert action.
Question 1
Download the code folder, with every file associated, from here https://drive.google.com/drive/folders/1T8B3gSNWjQU-JpifHkEDm9FfoxJA6wB_?usp=sharing Complete the code template provided in dagger_template.py, with the right code in every TODO section, to implement DAgger. Attach the completed file in your submission.
Question 2
Create a plot with the number of expert queries on the X-axis, and the performance of the imitation model on the Y-axis. Elaborate if you see any clear trends here. (Hint: in the env, the variable expert_calls counts the number of expert queries.
Question 3
Could you potentially improve on the number of queries to the expert made by the DAgger algorithm? Think about when querying the expert may be redundant.

Bonus points: Try implementing your answer from question 3, and generate a query-vs-reward plot similar to question 2 for this implementation. Compare this plot with your answer from Q2. Is there a clear improvement?
Python environment installation instructions
1. Make sure you have conda installed in your system. Instructions link here.
2. Then, get the conda_env.yml file, and from the same directory, run conda env create -f conda_env.yml. If you don’t have a GPU, you can remove the line saying – nvidia::cudatoolkit=11.1.
3. Activate the environment, conda activate hw1_dagger.
4. Then, install pybullet gym using the following instructions: https://github.com/benelot/pybullet-gym#installing-pybullet-gym
(New: alternately, just install pybullet-gym from here: https://github.com/shubhamjha97/pybullet-gym thanks Shubham!)
5. If you installed it from the official repo, go to the pybullet-gym directory, find this file: pybullet-gym\pybulletgym\envs\roboschool\envs\env_bases.py and change L29-L33 to the following:
self._cam_dist = 0.75
self._cam_yaw = 0
self._cam_pitch = -90
self._render_width = 320
self._render_height = 240
6. If you are still having trouble with training, up the image resize from (60, 80) to something higher.
7. Finally, run the code with python dagger_template.py once you have completed all the to-do steps in the code itself.

CSCI-GA 3033-090 Homework – 2, Deep Q Learning++

Introduction
This homework is designed to follow up on the lecture about Deep Q-Learning. For this assignment, you will need to know about the basics of the deep Q learning algorithm we talked about in class. If you have not already, we propose you brush up on the lecture notes. Furthermore, you will have to learn about a couple of algorithms that we have not discussed in detail in the class: either from the original papers or from blog posts around the internet.

Code folder
Find the folder with the provided code in the following google drive folder: https://drive.google.com/drive/folders/14VehoGYvIiKFJBbGKZdRquTCnlS9J4kv?usp=sharing. Download all the files in the same directory, and then follow the instructions in the env_installation.md file, and then complete the drql.py/utils.py/config.yaml files.

Thanks to Denis Yarats for the template for this code.
Submission
Please submit your homework using this google form link: https://forms.gle/H1BzdhNKT4eJWuPZ9

Deadline for submission: October 8th, 2021, 11:59 PM Eastern Time.
Points
● Questions 1-4 are 5 points each.
● Bonus question: 5 points.
● Total: 20 points (max 25 with bonus).

Deep Q-learning
In the class, we learned about the Deep Q-Network (DQN) Learning algorithm, which is considered the first large scale success for any deep reinforcement algorithm. This method is quite dated now, but for a lot of algorithms used today, the roots can be traced back to DQN.

One of the algorithms improved on DQN is Rainbow (https://arxiv.org/abs/1710.02298v1), which combined some of its contemporary improvements over DQN into one algorithm, like dueling DQN, double DQN, and Prioritized Experience Replay. Finally, more recently, Data-regularized Q-learning (DrQ) has improved on this baseline by adding image augmentations to DQN.

In this homework, we provide you an example implementation of DQN. We ask you to add some of those improvements made in Rainbow and DrQ to get to an almost state-of-the-art deep RL algorithm.
Environment
The environment we will use in this homework is built upon the Pong, Space Invaders, and Breakout environment from OpenAI gym Atari environments (https://gym.openai.com/envs/#atari). In this homework, we will attempt to learn these agents from image observations. We already have a working implementation of DQN on the code folder, which you can run as python train.py env=Breakout and so on. Your job is to complete all the TODOs, and turn on the completed features one by one. You can download the code folder, with every file associated, from here https://drive.google.com/drive/folders/1T8B3gSNWjQU-JpifHkEDm9FfoxJA6wB_?usp=sharing
Question 1
Download the code folder, and run the code for the three given environments. Make a plot of their performance over time. This is your baseline, and you will compare your future improvements to the code with this baseline to test their validity.
Question 2
First, add Double Q-learning onto the model (find the place by searching “TODO: double Q learning” in the code files.) Make another plot by running the three environments on your code that use double Q learning.
Question 3
Next, implement Prioritized Experience Replay. In the replay_buffer.py file, we already have an implementation of a prioritized replay buffer. Use that in your code. Find “TODO prioritized replay buffer”, and fix the priority update for the prioritized replay buffer. Make another set of plots, and compare them to the plots from Q2. Is your performance better or worse now? Try explaining your observations.
Question 4
Finally, implement Dueling DQN. Find the places where you have to put your code by searching for “TODO dueling DQN”. As before, plot your performance in the three environments. Now that you have an almost complete implementation of Rainbow, combine all of your plots together and show the improvement over vanilla DQN.
Question 5
Bonus points: We still haven’t used DrQ anywhere. Read the code to try and figure out how to use DrQ on top of DQN. You can read more in the blog post by the authors of the original DrQ paper here: https://sites.google.com/view/data-regularized-q. You will have to:
● Figure out what the best data augmentations are for the environment you have,
● Add those augmentations into the training process, and
● Report (improved) results from using those augmentations.
Python environment installation instructions
1. Make sure you have conda installed in your system. Instructions link here.
2. Then, get the conda_env.yml file, and from the same directory, run conda env create -f conda_env.yml.
3. Finally, run the code with python train.py once you have completed some of the to-do steps in the code itself.

CSCI-GA 3033-090 Homework 3, Policy Gradient Algorithms

Introduction
This homework is designed to follow up on the lecture about policy gradient algorithms. For this assignment, you will need to know about the basics of the policy gradient algorithm we talked about in class, specifically REINFORCE and PPO. If you have not already, we propose you brush up on the lecture notes.

You are allowed to discuss this homework with your classmates. However, any work that you submit must be your own — which means you cannot share your code, and any work that you submit with your write-up must be written by only you.
Code folder
Find the folder with the provided code in the following google drive folder: https://drive.google.com/drive/folders/1lm9-9in2OheyPowIWnxkCykXfp-u76Go?usp=sharing Download all the files in the same directory, and run the run.py file to run your code. You will have to complete the TODOs in ppo/ppo.py to complete this homework.

Thanks to Eric Yu for this code.
Environment
We will reuse the environment from homework 2, so you will not need to install anything else on top of it. If you need more directions about setting up with it, see here: https://docs.google.com/document/d/1p_mU1jZEQZk7gP_qgwVPtv6iae4bnjMa2FHkqV5t4K4/edit
Submission
Please submit your homework using this google form link: https://forms.gle/FDgzwJhWSjKtysWi8

Deadline for submission: October 22th, 2021, 11:59 PM Eastern Time.
Points
● Questions 1-2 are 5 points each.
● Total: 10 points

Questions
1. In the code folder, you will find already available code for running REINFORCE. Run this code on the following environments: Pendulum-v0, BipedalWalker-v3, and LunarLanderContinuous-v2. It is okay if REINFORCE does not perform as well in these environments. Generate the plot over training times for these 3 environments over three different seeds, and create three plots that show the average performance of REINFORCE on each environment. Why do you think REINFORCE suffers in these environments?

2. Now, complete the PPO code found in ppo/ppo.py. You will find a few different TODOs for you. Follow the original PPO pseudocode if you need to. Once again, use the previous three environments and three different seeds to plot your training rewards. Clearly show the comparison between REINFORCE and PPO in your plots.

Your expected mean performance should be AT LEAST:
Pendulum: -400
BipedalWalker: 125
LunarLanderContinuous: 100

Submit your writeup, along with your ppo.py file as your submission. If you change any of the hyperparameters, include run.py as well.

CSCI-GA 3033-090 Homework 4, Exploration Algorithms

Introduction
This homework is designed to follow up on the lecture about exploration algorithms, specifically about the multi-armed bandit. For this assignment, you will need to know about the basics of the bandit algorithm we talked about in class, and some basics of the epsilon greedy, upper confidence bound, and Thompson sampling algorithms. If you have not already, we propose you brush up on the lecture notes and read up about these algorithms online.

You are allowed to discuss this homework with your classmates. However, any work that you submit must be your own — which means you cannot share your code, and any work that you submit with your write-up must be written by only you.
Code folder
Find the folder with the provided code in the following google colab: https://colab.research.google.com/drive/19ht5cd7CoEkotj3bWnBaGKHaSdWhObH4?usp=sharing

Make a copy of the colab, edit it, and once you are done, submit it with your homework writeup.
Environment
Since this homework is in Google colab, it will not require any separate environment setup.
Submission
Please submit your homework using this google form link: https://forms.gle/h7gZCK1FFoPd4z4d7

Deadline for submission: October 29th, 2021, 11:59 PM Eastern Time.
Points
5 points for plotting.
5 points for each of the solvers working correctly: EpsilonGreedy, UCB, and Thompson Sampling working.
Total points: 20

Assignment
1. Make a copy of the colab to your drive, and then go through the skeleton code in it. At the very end, you will find a function that is supposed to run the bandit algorithms and plot their cumulative regret over time. Complete this function, and verify that it works by testing it with the two given environments and the FullyRandom solver.

Note: For a full score on this problem, the following must be true: each solver must be denoted by a different color, and each environment (Bernoulli bandit and Gaussian bandit) must be shown on a different plot. Make sure to label each of the two plots and each line in each plot with the associated algorithm as well. For formatting guidance, look at the given plot

2. Once you have finished it, implement EpsilonGreedy, UCB, and Thompson Sampling solvers. Make sure when you run the colab notebook, it generates the two associated plots: one for the Bernoulli bandit with all the algorithms, and another for the Gaussian bandit with all the algorithms.

Submit a link to this completed colab to the submission link above. Before submission, once again make sure that the following are in order:
a) plot titles,
b) axis labels,
c) line legends.

Also, make sure the sharing settings are turned on so we can check your solution and run it.

CSCI-GA 3033-090 Homework 1 to 4 solutions

Download Details:

Description

CSCI-GA 3033-090 Homework – 1, DAgger

CSCI-GA 3033-090 Homework – 2, Deep Q Learning++

CSCI-GA 3033-090 Homework 3, Policy Gradient Algorithms

CSCI-GA 3033-090 Homework 4, Exploration Algorithms

CSCI-GA 3033-090 Homework 1 to 4 solutions

Download Details:

Description

CSCI-GA 3033-090 Homework – 1, DAgger

CSCI-GA 3033-090 Homework – 2, Deep Q Learning++

CSCI-GA 3033-090 Homework 3, Policy Gradient Algorithms

CSCI-GA 3033-090 Homework 4, Exploration Algorithms

Related products

CSCI-GA.3033-016 Multicore Processors: Architecture & Programming Homework Assignment 1 solution

CSCI-GA.3033-016 Multicore Processors Lab Assignment 2 solution

CSCI-GA.3033-016 Multicore Processors: Architecture & Programming Lab 1 solution