Mobile Eye Gaze Estimation solution

$29.99

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

1 Overview Inthisfinalproject,youwillleverageonthelearnedDeepLearningtechniquesthrough the semester to tackle a challenging problem — mobile eye gaze estimation. Eye gaze estimationistopredictthedirectionorpositionyouarelookingat. Specificallyforthis project, we focus on mobile eye gaze estimation, that is, to predict the gaze position on the phone/tablet screen. Overall, our input is the face image captured by the frontal camera of the phone, and the output is a 2D position (x,y) on the phone screen. The following instructions will guide you through building an accurate mobile eye gaze estimator.
2 OriginalDatasetandModels
Figure 1: Overall architecture.
TheoriginaldatasetcomesfromtheGazeCaptureproject. Thedatasetinvolvesover 1400 subjectsandresultsinmorethan 2 millionfaceimages. Theirmodelarchitecture is illustrated in Fig. 1. Thenetworkinvolves 4 pathways. Besidesfaceimages,theyalsoaddeyecropsfor both left and right eyes as eye gaze is mostly affected by eye appearances. Besides, as
1
the gaze prediction (x,y) on the screen also depends on the head/eye position, a face mask indicating face position in the whole image is also used. More details can be referred to the project page or their papers.
3 Ourdataset Due to the limitation of computation power, we create a much smaller dataset with 48000 training samples, 5000 validation samples and 7000 reserved testing samples. Each sample contains 5 items: face, left eye, right eye, face mask and labels. The dataset can be downloaded here. And you can use following code to read the data.
import numpy as np npzfile = np.load(“train_and_val.npz”)
train_eye_left = npzfile[“train_eye_left”] train_eye_right = npzfile[“train_eye_right”] train_face = npzfile[“train_face”] train_face_mask = npzfile[“train_face_mask”] train_y = npzfile[“train_y”]
val_eye_left = npzfile[“val_eye_left”] val_eye_right = npzfile[“val_eye_right”] val_face = npzfile[“val_face”] val_face_mask = npzfile[“val_face_mask”] val_y = npzfile[“val_y”]
4 PotentialModels The goal of this project is to estimate eye gaze as accurate as possible. You can use whateveryouhavelearnedintheclassoronlinetobuildaproperdeepeyegazeestimator. Themodelarchitectureandhyperparametersarealluptoyou. Youarefreetousea similararchitecturewithfourpathwaysasintheoriginalpaper,oryoucandesignyour own model architectures. Besides, you also need to choose proper hyperparameters like filter size, number of filters, etc.
4.1 ArchitectureVisualization For this project, as everyone might use different architectures, you are required to visualize your model architecture through TensorBoard. After launching TensorBoard, navigate to GRAPHS on the top panel, and download the png image. It is also suggested to explore other utilities of TensorBoard. Fig. 2 shows an example of tensorboard visualization with four pathway input.
2
Figure 2: Four pathway graph tensorboard visualization.
4.2 Modelevaluation Different from previous assignments for classification, we need regression for this project. Assuming we have N testing samples, and your prediction yp should be of size N ×2. To evaluate the model, we use the mean Euclidean distance between yp and ground truth position y:
err = np.mean(np.sqrt(np.sum((yp – y)**2, axis=1)))
However, you are not required to use the above error as loss functions.
3
4.3 Faircomparison We do have considered the case for students with less computation powers, which restrictthemfromusingacomplexmodelandgettingbetteraccuracy. Forfaircomparison, we first limit the training data size to 48000, to penalize complex models which leads to over-fitting. Second the grading rubric makes sure simple models (certain prediction error) can get at least 70% model evaluation credits.
4.4 ComputationaltimeandExpectedError We have done benchmark testing with only one pathway (left eye) input, 3 convolutional layers and 2 pooling layers. Fig. 3 shows an example convergence curve. Note it is not converged yet and more training epochs should give better results. When training on a Lenovo t440s machine with 8GB of RAM, it takes around 6 hours after 25 epochs, and reaches an error of 3 cm.
0 5 10 15 20 25 30 35 40 epochs
0
5
10
15
20
25
30
35
loss
2
3
4
5
6
7
8
error / cm
training loss training error testing error
Figure 3: Loss and error for training and testing.
4.5 ModelSaving In order to evaluate your models, we need access to the placeholder for all 4 pathway inputs as well as the prediction operator. Notice this does not mean you need to use all 4 pathwayinputs. Ifyouchoosetoonlybuildmodelswithonepathway,thenyouneed to create placeholder for other 3 pathways and add them to collection.
tf.get_collection(“validation_nodes”)
4
tf.add_to_collection(“validation_nodes”, eye_left) tf.add_to_collection(“validation_nodes”, eye_right) tf.add_to_collection(“validation_nodes”, face) tf.add_to_collection(“validation_nodes”, face_mask) tf.add_to_collection(“validation_nodes”, predict_op) saver = tf.train.Saver() save_path = saver.save(sess, “my_model”)
5 Submission You must submit the following items for this project: • Your source code • Your saved model ”my-model” – With graph collection with all four pathway input and the prediction nodes – Make sure to validate your model by running
python validation_script.py my-model -v train_and_val.npz
• Write up – Loss and accuracy plots (Fig. 3) – Model architecture visualization (Sec. 4.1 and Fig. 2) – Explanation and justification
6 GradingRubric • 5 pts for code quality – Appropriate TensorFlow functions are used. – Readable and commented where necessary. – Not plagiarized. • 30 pts for final error – error≥ 4.0: 5 pts with a model that loads. – 4.0 error≥ 3.0: 10 pts. – 3.0 error≥ 2.6: 20 pts. – error< 2.6: rankwithallstudentsandassignpropercredits. Thefirstplace get full credits. 5 • 25 pts for write up – 5 pts for clear explanation and justification. – 5 pts for loss and error plot. – 10 pts for clear and well-organized model architecture visualization. – 5 pts for architecture comparison or different pathway comparison. 7 Questions Anyquestionsonthedatasetandthemodel, contact KangWangatwangk10@rpi.edu.