CMPUT366 P2 — Supervised Learning with Tile Coding (Python) solution




5/5 - (4 votes)

The objective of this project is to implement 2D tile coding and use it to construct the
binary-feature representation for a simple case of supervised learning of a real-valued
As in all supervised learning, the training set consists of a set of examples, where
each example is a pair, input and target, and the objective is to learn a function
f such that f(input) is close to target. In this project, each input consists of two
numbers or coordinates, let’s call them x and y, each of which is between 0 and 6
(that is, input .
= (x, y) 2 [0, 6)2), and the target is a single (scalar) real number.
Thus, a single example might be written ((x, y),target). Here is a training set of
four examples that we will use in the project:
Example# x y target
1 0.1 0.1 3
2 4 2 1
3 5.99 5.99 2
4 4 2.1 1
Receiving such a training set, your job will be to write the Python function f(x, y)
which returns a guess of the target corresponding to the input (x, y). Your guesses
should be close to the targets on the training set and, more importantly, close to
the targets on any new examples that you might see in the future (generalization).
You will “receive the training set” by receiving multiple calls to the Python function
learn(x, y, target), which you will also write. We will discuss these further in Part
2 below; first we focus on the tile-coding part for converting points in input space to
large binary feature vectors (or rather small arrays containing the indices of the few
1 components of the binary feature vectors).
Part 1: Tile Coding
Implement your supervised-learning algorithm using tile coding over x and y. The
first step is to write the function tilecode(x, y,tileIndices) which takes in x, y
and an array (tileIndices) and fills the array with tile indices, one index per tiling.
Use eight standard grid tilings (numTilings=8) of the input space, where each tile
extends 0.6 (one tenth of the [0, 6) range) in each of the x and y directions. You might
think of each tiling as forming a 10⇥10 grid, but actually make it an 11⇥11 grid so
that it extends beyond the input space by one tile width and height. We do this
because each tiling after the first will be o↵set by 1/8th of a tile width and height
0 6
0 6
–0.6/8 0
first tiling second tiling
Figure 1: The relationship between the input space (in bold) and the first and second
11×11 tilings.
in the x and y directions. Thus, the first tiling will look like the left side of Figure
1 and the second tiling will look like the right side of the figure. Subsequent tilings
will each be o↵set by a further 1/8th of a tile in each direction (more generally,
by 1/numTilings, of course). This is not the best way to o↵set the tilings, and
you may see artifacts in your results because of it, but we do it anyway because it
is the simplest. The next better method would be to o↵set each tiling by a random
Every tile has a di↵erent tile number (index). Assuming that you number the tiles
in the natural way, the tiles in the first tiling will run from 0 to 120, and the tiles
in the second tiling will run from 121 to 241 (why?). A given input point will be in
exactly one tile in each tiling. For example, the point from the first example in the
training set above, x = 0.1 and y = 0.1, or (0.1,0.1), will be in the first tile of the first
seven tilings, that is, in tiles 0, 121, 242, 363, 484, 605, 726 (why?). In the eighth
tiling this point will be in the 13th tile (why?), which is tile 859 (why?). If you
call tilecode(0.1,0.1,tileIndices), then afterwards tileIndices will contain
exactly these eight tile indices. The largest possible tile index is 967 (why?).
What to turn in for Part 1. Download the template or shell file (from
the dropbox) and edit it to contain your function tilecode. Test your code by
running that file and printing the results. Turn in this printout (as printout1.pdf)
and your modified file. The code just calls your tilecode function on
the four example input points given above and prints out the tile indices for each
case. If you number the tiles as suggested, then the first example should produce
the eight integer tile indices given above. Another check on your code is that none
of the indices should be negative or greater than 967. Finally, the second and fourth
examples should produce very similar sets of indices (they should have many tiles in
common) (why?). We will use this printout to check that your tilecode function is
working properly and to help in giving partial credit. Finally, provide a list of very
brief answers to the six why questions in the project specification above (as a file
why1.pdf). After you have completed part 1, comment out the four print statements
at the end of the file in preparation for using your tile coder in part 2.
Part 2: Supervised Learning by Gradient Descent
In this part you will use your tile coder in conjunction with linear function approximation to learn an approximate function over the entire input space. The
learning algorithm we use is stochastic gradient descent in the weights of a linear
function approximator; it is known as the Least-Mean-Square (LMS) algorithm in
the literature. As a linear approximator, the function it produces is a linear combination of a vector of learned weights ✓ = (✓1, ✓2, …, ✓n)>, and a vector of features
(x, y)=(1(x, y), 2(x, y), …, n(x, y))> for a given input x, y:
f(x, y) = ✓11(x, y) + ✓22(x, y) + ··· + ✓nn(x, y).
As a gradient-descent learning algorithm, the weights are updated, given an input
(x, y), a corresponding feature vector (x, y) and target value target, and an estimated function value f(x, y), by
✓j ✓j + ↵

target f(x, y)

j (x, y), j = 1, . . . , n.
Your task in this part of the project is to implement the first of these equations in the
python function f(x, y) and the second in the python function learn(x, y, target),
using feature vectors constructed by the tile coder you wrote in Part 1, and assuming
the general setup outlined in the introduction.
First, note that n in the above equations is the total number of tiles in your tile coder,
or 968. Because your arrays are probably indexed from 0, your loops will probably
go from 0 to 967. Second, note that, because you are doing tile coding, you don’t
have the feature vector (x, y) in an explicit form. Instead you have a list of indices
j where j (x, y) is 1, with all others assumed to be 0. If you think about it, this
really simplifies your job of implementing both equations. Of course you will need
to keep the weight vector ✓ in explicit form. That is, you really will need a vector of
968 floating point numbers; this is your memory; initialize all of its elements to zero.
The step-size parameter ↵ should be set to 0.1 divided by the number of tilings.
Completing Part 2 consists of the following steps:
1. Download the file (in the dropbox) which contains test and
template code.
2. Edit the functions f and learn to implement the desired functions, calling your
Tilecoder.tilecode function when needed.
3. Run The last line of the file calls the test1 function, which
calls your learning algorithm on the four example points and prints out results.
As a result of learning, your approximate function value should move 10% of
the way from its original value towards the target value. The before value of
the fourth point should be nonzero (why?). Make sure this is working correctly
before going on to the next step.
4. Change the last line of to call test2 instead of test1 and
execute it again. The test2 function constructs 20 examples using the target
target = sin(x 3) cos(y) + N(0, 0.1),
where N(0, 0.1) is a normally distributed random number with mean 0 and
standard deviation 0.1. The examples are sent to learn, and then the resultant
function f (after 20 examples) is printed out to a file (f20) in a form suitable
for plotting by Excel or similar. A Monte Carlo estimate of the Mean-SquaredError in the function is also printed to standard output. The program then
continues for 10,000 more examples, printing out the MSE after each 1000, and
then finally prints out the function to a file (f10000) for plotting again. You
should see the MSE coming down smoothly from about 0.25 to almost 0.01 and
staying there (why does it not decrease further towards zero?).
5. Make 3D plots of the function learned after 20 and 10,000 examples, using the data printed in step 4. One way to do this is to use the program provided in the dropbox, as in “python f20”. After 10,000 examples, the learned function should look very much like the target function
minus the noise term. You can test that by typing the target function minus the noise into one of the online 3D plotting programs, such as that at
6. After only 20 examples, your learned function will not yet look like the target
function. Explain in a paragraph why it looks the way it does. If your learned
function involves many peaks and valleys, then be sure to explain both their
number, their height, and their width. Suppose that, instead of tiling the
input space into an 11⇥11 grid of squares, you had divided into an 11⇥21
grid of rectangles, with the x dimension being divided twice as finely as the
y dimension. Explain how you would expect the function learned after 20
examples to change if this alternative tiling were used.
What to turn in for Part 2. Turn in 1) your edited version of, 2) a
pdf file with the printout from step 3 together the MSEs printed from step 4, 3) the
two 3D plots from step 5 (as files f20.pdf and f10000.pdf), 4) your explanations from
step 6 (as a file explain.pdf), and 5) your brief answers to the why questions in steps
3 and 4 (as a file why2.pdf). If you prefer, you can substitute .txt files for some of
the .pdf files. All these files should be combined into a zip file and submitted on