1. Policy gradient for CartPole environment. You will train a policy network on the Cartpole environment (https://gym.openai.com/envs/CartPole-v1/) discussed in the demo. The network
architecture of the policy network with three linear layers is given below:
First Fully connected layer (nn.Linear in PyTorch) – input features – 4, output features – 24
Second Fully connected layer (nn.Linear in PyTorch) – input features – 24, output features – 36
Third Fully connected layer (nn.Linear in PyTorch) – input features – 36, output features – 2.
This only requires replacing the neural network part of the code discussed in the demo. Please
plot the total reward per episode as a function of episode number (X axis – episode number and Y
axis – total reward corresponding to that episode).
2. Automatic hyper-parameter tuning via Bayesian Optimization. For this homework, you need
to use BO software to perform hyper-parameter search for Bagging and Boosting classifiers: two
hyper-parameters (size of ensemble and depth of decision tree).
You will employ Bayesian Optimization (BO) software to automate the search for the best
hyper-parameters by running it for 50 iterations. Plot the number of BO iterations on x axis and
performance of the best hyper-parameters at any point of time (performance of the corresponding
trained classifier on the validation data) on y-axis. Please use Fashion MNIST dataset for this
task. You can use a smaller sized dataset if compute power is a hurdle.
Please follow all the instructions regarding code submission as mentioned in previous