Description
In this project, you will use IMU sensor readings from gyroscopes and accelerometers to train a set of Hidden Markov Models (HMMs) and recognize different arm motion gestures. You should then be able to classify unknown arm motions in real-time using your algorithm. Training Data Download: Now available! Repeated gestures at: https://cornell.box.com/v/ECE5242Proj2-train Single gestures (same format as test): https://cornell.box.com/v/ECE5242Proj2-train-additional Test Data Release: 3/7/2022 9:25am, https://cornell.box.com/v/ECE5242Proj2-test The data sets contain the raw IMU readings collected from a consumer mobile device and corresponding labels that describe the arm motions associated with the movements. You can find the coordinate system used here: https://developer.android.com/reference/android/hardware/SensorEvent.html *Data format for each column (7 columns in total) in the IMU data is ts , Ax, Ay, Az, Wx, Wy, Wz (Time (millisecond), 3x Accelerometer (m/s2), 3x Gyroscope (rad/sec)) Upload: on Canvas: (1) Code (due 3/3/2022 9:25am, _project2.zip ) : Do not include training/test data. Include all your training and testing code. Include your trained models. Your readme should give specific, clear instructions what file paths to update and how to run your code in order to apply your trained models to test data (without needing to re-train your HMM on training data). Your readme should also give a brief description of all files you included. (2) Write-up (due 3/7/2022 11:59pm ,_project2.pdf ) : Write a project report including the following sections: Introduction, Problem Formulation, Technical Approach, Results and Discussion. For each gesture model, you must include your training plot of your loglikelihood per epoch in your results. *Clearly presenting your approach in the report and having good algorithm performance are equally important. Collaboration: From the syllabus: All work that is handed in for this course should be performed solely by the individual students. However, you are allowed to discuss approaches and methods with your colleagues, but any collaborative influences should be properly documented in your reports. This applies to code that you hand in as well— submitted work must be original. Violators will receive a failing grade and be reported to the appropriate disciplinary office; see the following site for more details: https://theuniversityfaculty.cornell.edu/academic-integrity/” If you do get useful ideas from other classmates in your discussions, please list the classmate(s) near the beginning of your writeup along with a tiny summary of the idea/help. You should not let other students see your code and you should not look at another student’s code. If you do need to look up some syntax for a minor coding issue online, please include the url of the resource you used in a comment directly above the relevant lines of code in your submission. If you include example images from online sources in your writeup, please cite the source directly next to the image. Project Steps: 1) VISUALIZE AND QUANTIZE. Visualize the raw data and experiment with discretizing the raw sensor information. You should discretize your data observations using k-means clustering to transform your continuous observations to discrete observations. We recommend 50-100 observation clusters. You can then try to manually split the recordings of repeated gestures into individual gestures by inspecting the time series of the observations. 2) IMPLEMENT HMM. You will use hidden markov models (HMMs) to classify gesture data. You will learn one HMM for each gesture. The HMM can compute the probability of a sequence of observations being generated by that gesture. The classification prediction across gestures will simply by the argmax of all the gesture models, that is, the model that assigns the highest probability to the sequence of observations. Within your HMM, a hidden state will correspond to a component of a gesture (like “swooping down”, “curving left”, “curving right”, etc.). Each hidden state has some discrete probability distribution for the (discretized) observations you might see in that state. For example: during “swooping down”, the discretized IMU data might have some distribution of probability split between several different observation clusters, due to noise or variation between examples. The observation model is the probability of a measurement from a particular hidden state falling within a particular observation cluster. We recommend using 10-20 hidden states and 50-100 observation clusters. Write down the corresponding model parameters that will need to be learned from the training data. Use the Baum-Welch algorithm to learn the model parameters. Read the Rabiner HMM Tutorial posted under the Files tab (Reading Materials folder) on Canvas. Additional hints on the notation used in that paper are available at https://alumni.media.mit.edu/~rahimi/rabiner/rabinererrata/. Be sure to include a plot of your log-likelihoods generated during your training process in your writeup. 3) TESTING. You should then make sure that you can apply your saved models to input sensor readings from unknown gestures. You can then compute the log likelihood under the different HMM models, and then show how confidently you can classify the unknown motion as a particular gesture by comparing the log-likelihoods from the different models. We will release test data on the due date of the code. Please include the results of your classification of the test set in the report including the associated log-likelihoods. Accuracy of the test set predictions will be worth 50% of the points of the coding portion of the project (which is half the total points including the report). Include the top 3 predictions for each test gesture (we’ll give partial credit if your 1st is wrong but 2nd is right). Tips: 1) The data contains six different motions: Wave, Infinity, Eight, Circle, Beat3, Beat4. The names indicate their corresponding motions. (Disregard the numbers by the motions). The first set of training data contains motions repeated multiple times. The second set of train data contains each motion performed once. The test data will contain motions only performed once each. 2) Experiment and visualize your work while discretizing the raw data. This will give you an idea how to use the data and what features you are extracting. Feature extraction, as you’ve learned in Project 1, is a very important part of the learning process. One way to get discrete observations from your continuous features is to run k-means clustering, and each new state will be assigned to its closest cluster. (Advanced note: You don’t have to do vector quantization if you want to use a continuous emission (eg: gaussian emission) HMM. But we strongly suggest doing vector quantization to make your HMM training easier.) 3) Choose the number of hidden states (N) and observation classes (M) with cross validation. A good starting range to think about is N between 10 and 20 and M between 50 and 100. 4) Try different initialization strategies 5) Like in Project 1, you can set a maximum number of iterations or use a threshold to detect convergence of the learning algorithm. 6) Be aware about underflow issues. You can either use do your calculations in logspace or use the scaling procedure described by Rabiner in Section V of the paper. 7) Consider emission probabilities of 0. If you never see a particular observation during gesture training, the emission probability of that observation would be set to 0 during Baum-Welch. And then, if during test-time you see that observation at some timepoint in your test data, your model thinks the probability of the entire trajectory is 0. Why is that a problem? This is a problem if you want to compare two models to see which one is more likely and both return a probability of 0. You can’t compare two 0’s to figure out which is bigger. What can you do about it: One solution is to insist that your emission model never return exactly 0 probability. For example, you could add 1e-8 to your emission probability model table to ensure that no probability is exactly zero. Intuitively, this would be adjusting your model to say “it’s really really unlikely to see this observation, because we’ve never seen it in the training data” rather than “it’s impossible to see this observation, because I’ve never seen it in the training data”. 8) If you’re struggling to see if your HMM training is correct, first check to make sure your loglikelihood training plots make sense to you. Additionally, you can come up with example toy data to see if your algorithm converges correctly. You could use something like the following, very simple toy data: 1) The data is of a process that stays in hidden state 0 for 1000 steps, then in hidden state 1 for 1000 steps, then in hidden state 0 for 1000 steps. 2) Each hidden state has emission probabilities: In state 0: always emit observation with label 0 In state 1: emit observation with label 0 half the time and 1 half the time. The python code to generate the associated observations for that toy example is: X=np.concatenate((np.zeros((1000,1)),np.random.randint(0,2,size=(1000,1)),np.zeros((100 0,1)))) The model that best fits this data should look something like: Transition matrix: [[0.999, 0.001], [0.001, 0.999]] Emission matrix: (row index indicates which state you’re in, column index indicates which observation you’re emitting). [[1.0, 0.0], [0.5, 0.5]] Since there’s randomness involved in the test data (and since there’s the possibility that your BaumWelch gets caught in a local maximum), it’s OK if your results aren’t exactly the optimal values, but being often close to those optimal values is a good sign that your HMM training algorithm is probably working. 9) Use Matrix Multiplication After you’re done getting your code to work, you may find that your code is very slow. If you do, this tip may be useful for you to speed up your code without changing any of the mathematical calculations it does. In python, matrix multiplication using numpy’s dot product function is much much MUCH faster than if you write matrix multiplication by hand using a for loop. (About a thousand times faster). So, if you have any places in your code that look at all like: new = np.zeros((50,100)) for i in range(50): for j in range(100): for k in range(50): new[i,j] += A[i,k] * B[k,j] You really should replace that with something that looks like new = A.dot(B) 10) Consider using a left-to-right HMM transition structure to simplify training (only if you’ve split apart the repeated training gestures into individual training gestures).