Description
Goal
Data Set
The data set includes time stamps with date and hour/minute/second within the day. You are not to use time stamp features for predicting occupancy. Since this is a commercial office building, the time stamp is a strong predictor of occupancy. Rather, the goal is to determine whether occupancy can be sensed from: (1) temperature, expressed in degrees Celsius, (2) relative humidity, expressed as a %, (3) light, in lux, (4) CO2, in ppm, and (5) the humidity ratio, which is derive from the temperature and the relative humidity.
The training data are to be found here. The test data are to be found here. There are 8144 training examples and 9753 test examples.
Part 1
(1a) Report the training and test set performance in terms of % examples classified correctly.
Remember an important property of the perceptron algorithm: it is guaranteed to converge only if there is a setting of the weights that will classify the training set perfectly. (The learning rule corrects errors. When all examples are classified correctly, the weights stop changing.) With a noisy data set like this one, the algorithm will not find an exact solution. Also remember that the perceptron algorithm is not performing gradient descent. Instead, it will jitter around the solution continually changing the weights from one iteration to the next. The weight changes will have a small effect on performance, so you’ll see training set performance jitter a bit as well.
Part 2
(2a) Decide what error function you wish to use. Two obvious candidates are squared error and cross entropy. Report which you have picked.
(2b) As a way of verifying that your network learns something beyond prior statistics of the training set, let’s compute a measure of baseline performance. As a measure of baseline, use the training set to determine the constant output level of the network, call it C, that will minimize your error measure. That is, assume your net doesn’t learn to respond to the inputs, but rather gives its best guess of the output without regard to the input. Then for any example where the target is 0 the network will output C and for any example where the target is 1 the network will output C. Using your error measure, solve for C and compute the baseline error. Report the baseline error.
(2c) Using a network with H=5 hidden units, and mini-batches of size N=100, select a learning rate (or a learning rate schedule) that results in fairly consistent drops in error from one epoch to the next, make a plot of the training error as a function of epochs. On this graph, show a constant horizontal line for the baseline error. If your network doesn’t drop below this baseline, there’s something going awry. For now, train your net until you’re pretty sure the training error isn’t dropping further (i.e., a local optimum has been reached).
(2d) Report the learning rate (or learning rate schedule) you used to produce the plot in (2c)
(2e) Report training and test set performance in terms of % examples classified correctly.
(2f) Now train nets with varying size, H, in {1, 2, 5, 10 20}. You may have to adjust your learning rates based on H, or use one of the heuristics in the text for setting learning rates to be independent of H. Decide when to stop training the net based on training set performance. Make a plot, as a function of H, of the training and test set performance in terms of % examples classified correctly.
Part 3 (Extra Credit)
(3a) Determine an appropriate representation for the time of day. Describe the representation you used. For example, you might add one unit with a value ranging from 0 to 1 for times ranging from 00:00 to 23:59. Report the representation you selected.
(3b) Train your net with H=5 hidden and compare training and test set performance to the net you built in (2e)

