CSCI 4100/6100 RPI Machine Learning From Data
LFD is the class textbook
1. (200) Exercise 3.4 in LFD 2. (200) Problem 3.1 in LFD 3. (200) Problem 3.2 in LFD 4. (200) Problem 3.8 in LFD 5. (200) Problem 3.6 in LFD (6xxx Level Only) 6. (200) Handwritten Digits Data – Obtaining Features
You can download the two data ﬁles with handwritten digits data: training data (ZipDigits.train) and test data (ZipDigits.test). Each row is a data example. The ﬁrst entry is the digit, and the next 256 are grayscale values between -1 and 1. 256 pixels corresponds to a 16 × 16 image. For this problem, we will only use the 1 and 5 digits, so remove the other digits from your training and test examples.
(a) Familiarize yourself with the data by giving a plot of two of the digit images. (b) Develop two features to measure properties of the image that would be useful in distinguishing between 1 and 5. You may use symmetry and average intensity (as discussed in class) or anything else you think will work better. Give the mathematical deﬁnition of your two features. (c) As in the text, give a 2-D scatter plot of your features: for each data example, plot the two features with a red ‘×’ if it is a 5 and a blue ‘◦’ if it is a 1.