Description
The Challenge
Automatic image processing is a key component to many AI systems, including facial recognition and video compression. One basic method for processing is segmentation, by which we divide an image into a fixed number of components in order to simplify its representation. For example, we can train a mixture of Gaussians to represent an image, and segment it according to the simplified representation as shown in the images below.
In this assignment, you will learn to perform image segmentation. To this end, you will implement Gaussian mixture models and iteratively improve their performance. You will perform this segmentation on the “Bird” and “Party Spock” images included with the assignment.
About the Assignment
The tests for the assignment are provided in the notebook, so Bonnie is only for submission purposes. The tests on Bonnie will be similar to the ones provided here, but the images being tested against, and the values for calculations will be different.
Thus, you will be allowed only 5 submissions on Bonnie. Make sure you test everything before submitting. Score for the last submission counts. The code will be allowed to run for not more than 2 hours per submission. In order for the code to run quickly, make sure to vectorize the code (more on this below).
Your Mission (should you choose to accept it)
Your assignment is to implement several methods of image segmentation, with increasing complexity.
- Implement k-means clustering to segment a color image.
- Build a Gaussian mixture model to be trained with expectation-maximization.
- Experiment with varying the details of the Gaussian mixture model’s implementation.
- Implement and test a new metric called the Bayesian information criterion, which guarantees a more robust image segmentation.
Grading
The grade you receive for the assignment will be distributed as follows:
- k-Means Clustering (20 points)
- Gaussian Mixture Model (40 points)
- Model Performance Improvements (20 points)
- Bayesian Information Criterion (20 points)
- Bonus
Resources
The em.pdf chapter in the assignment folder gives a good explanation of implementing and training mixture models, particularly 424-427 (k-means) and 435-439 (mixture models and EM). The book Elements of Statistical Learning, pages 291-295.
Background
A Gaussian mixture model is a generative model for representing the underlying probability distribution of a complex collection of data, such as the collection of pixels in a grayscale photograph.
In the context of this problem, a Gaussian mixture model defines the joint probability f(x)f(x) as
where xx is a grayscale value [0,1], f(x)f(x) is the joint probability of that gray scale value, mimi is the mixing coefficient on component ii, NiNi is the ithith Gaussian distribution underlying the value xx with mean μiμi and variance σ2iσi2.
We will be using this model to segment photographs into different grayscale regions. The idea of segmentation is to assign a component ii to each pixel xx using the maximum posterior probability
Then we will replace each pixel in the image with its corresponding μiμi to produce a result as below (original above, segmented with three components below).
from __future__ import division
import warnings
warnings.simplefilter(action = "ignore", category = FutureWarning)
import numpy as np
import scipy as sp
from matplotlib import image
import matplotlib.pyplot as plt
import matplotlib.cm as cm
"""Helper image-processing code.
These have been added in a separate python file and added in to the repo.
The functions below have all been imported in to your submission file"""
def image_to_matrix(image_file, grays=False):
"""
Convert .png image to matrix
of values.
params:
image_file = str
grays = Boolean
returns:
img = (color) np.ndarray[np.ndarray[np.ndarray[float]]]
or (grayscale) np.ndarray[np.ndarray[float]]
"""
img = image.imread(image_file)
# in case of transparency values
if(len(img.shape) == 3 and img.shape[2] > 3):
height, width, depth = img.shape
new_img = np.zeros([height, width, 3])
for r in range(height):
for c in range(width):
new_img[r,c,:] = img[r,c,0:3]
img = np.copy(new_img)
if(grays and len(img.shape) == 3):
height, width = img.shape[0:2]
new_img = np.zeros([height, width])
for r in range(height):
for c in range(width):
new_img[r,c] = img[r,c,0]
img = new_img
return img
def matrix_to_image(image_matrix, image_file):
"""
Convert matrix of color/grayscale
values to .png image
and save to file.
params:
image_matrix = (color) numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]] or (grayscale) numpy.ndarray[numpy.ndarray[float]]
image_file = str
"""
# provide cmap to grayscale images
cMap = None
if(len(image_matrix.shape) < 3):
cMap = cm.Greys_r
image.imsave(image_file, image_matrix, cmap=cMap)
def flatten_image_matrix(image_matrix):
"""
Flatten image matrix from
Height by Width by Depth
to (Height*Width) by Depth
matrix.
params:
image_matrix = (color) numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]] or (grayscale) numpy.ndarray[numpy.ndarray[float]]
returns:
flattened_values = (color) numpy.ndarray[numpy.ndarray[float]] or (grayscale) numpy.ndarray[float]
"""
if(len(image_matrix.shape) == 3):
height, width, depth = image_matrix.shape
else:
height, width = image_matrix.shape
depth = 1
return image_matrix.reshape(height*width, depth)
def unflatten_image_matrix(image_matrix, width):
"""
Unflatten image matrix from
(Height*Width) by Depth to
Height by Width by Depth matrix.
params:
image_matrix = (color) numpy.ndarray[numpy.ndarray[float]] or (grayscale) numpy.ndarray[float]
width = int
returns:
unflattened_values = (color) numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]] or (grayscale) numpy.ndarray[numpy.ndarray[float]]
"""
heightWidth = image_matrix.shape[0]
height = int(heightWidth / width)
if(len(image_matrix.shape) > 1 and image_matrix.shape[-1] != 1):
depth = image_matrix.shape[-1]
return image_matrix.reshape(height, width, depth)
else:
return image_matrix.reshape(height, width)
def image_difference(image_values_1, image_values_2):
"""
Calculate the total difference
in values between two images.
Assumes that both images have same
shape.
params:
image_values_1 = (color) numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]] or (grayscale) numpy.ndarray[numpy.ndarray[float]]
image_values_2 = (color) numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]] or (grayscale) numpy.ndarray[numpy.ndarray[float]]
returns:
dist = int
"""
flat_vals_1 = flatten_image_matrix(image_values_1)
flat_vals_2 = flatten_image_matrix(image_values_2)
N, depth = flat_vals_1.shape
dist = 0.
point_thresh = 0.005
for i in range(N):
if(depth > 1):
new_dist = sum(abs(flat_vals_1[i] - flat_vals_2[i]))
if(new_dist > depth * point_thresh):
dist += new_dist
else:
new_dist = abs(flat_vals_1[i] - flat_vals_2[i])
if(new_dist > point_thresh):
dist += new_dist
return dist
image_dir = 'images/'
image_file = 'party_spock.png'
values = image_to_matrix(image_dir + image_file)
print(values)
Part 0: Note on Vectorization
The concept of Vectorization was introduced in the last section of Assignment 4. For this assignment, please vectorize your code wherever possible using numpy arrays, instead of running for-loops over the images being processed.
For an example of how this might be useful, consider the following array:
A = [12 34 1234 764 …(has a million values)… 91, 78]
Now you need to calculate another array B, which has the same dimensions as A above. Say each value in B is calculated as follows:
(each value in B) = square_root_of(some constants pi log(k) * (each value in A))/7
You might wish to use a for-loop to compute this. However, it will take really long to run on an array of this magnitude.
Alternatively, you may choose to use numpy and perform this calculation in a single line. You can pass A as a numpy array and the entire calculation will be done in a line, resulting in B being populated with the corresponding values that come out of this formula.
Part 1: K-means clustering
20 pts
One easy method for image segmentation is to simply cluster all similar data points together and then replace their values with the mean value. Thus, we’ll warm up using k-means clustering. This will also provide a baseline to compare with your segmentation. Please note that clustering will come in handy later.
Fill out k_means_cluster()
to convert the original image values matrix to its clustered counterpart. Your convergence test should be whether the assigned clusters stop changing. Note that this convergence test is rather slow. When no initial cluster means are provided, k_means_cluster()
should choose kk random points from the data (without replacement) to use as initial cluster means.
For this part of the assignment, since clustering is best used on multidimensional data, we will be using the color image bird_color_24.png
.
You can test your implementation of k-means using our reference images in k_means_test()
.
Try to vectorize the code for it to run faster. Without vectorization it takes 25-30 minutes for the code to run.
from random import randint
from functools import reduce
def k_means_cluster(image_values, k=3, initial_means=None):
"""
Separate the provided RGB values into
k separate clusters using the k-means algorithm,
then return an updated version of the image
with the original values replaced with
the corresponding cluster values.
params:
image_values = numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]]
k = int
initial_means = numpy.ndarray[numpy.ndarray[float]] or None
returns:
updated_image_values = numpy.ndarray[numpy.ndarray[numpy.ndarray[float]]]
"""
# TODO: finish this function
raise NotImplementedError()
return updated_image_values
def k_means_test():
"""
Testing your implementation
of k-means on the segmented
bird_color_24 reference images.
"""
k_min = 2
k_max = 6
image_dir = 'images/'
image_name = 'bird_color_24.png'
image_values = image_to_matrix(image_dir + image_name)
# initial mean for each k value
initial_means = [
np.array([[0.90980393,0.8392157,0.65098041],[0.83137256,0.80784315,0.69411767]]),
np.array([[0.90980393,0.8392157,0.65098041],[0.83137256,0.80784315,0.69411767],[0.67450982,0.52941179,0.25490198]]),
np.array([[0.90980393,0.8392157,0.65098041],[0.83137256,0.80784315,0.69411767],[0.67450982,0.52941179,0.25490198],[0.86666667,0.8392157,0.70588237]]),
np.array([[0.90980393,0.8392157,0.65098041],[0.83137256,0.80784315,0.69411767],[0.67450982,0.52941179,0.25490198],[0.86666667,0.8392157,0.70588237],[0,0,0]]),
np.array([[0.90980393,0.8392157,0.65098041],[0.83137256,0.80784315,0.69411767],[0.67450982,0.52941179,0.25490198],[0.86666667,0.8392157,0.70588237],[0,0,0],[0.8392157,0.80392158,0.63921571]]),
]
# test different k values to find best
for k in range(k_min, k_max+1):
updated_values = k_means_cluster(image_values, k, initial_means[k-k_min])
ref_image = image_dir + 'k%d_%s'%(k, image_name)
ref_values = image_to_matrix(ref_image)
dist = image_difference(updated_values, ref_values)
print('Image distance = %.2f'%(dist))
if(int(dist) == 0):
print('Clustering for %d clusters produced a realistic image segmentation.'%(k))
else:
print('Clustering for %d clusters didn\'t produce a realistic image segmentation.'%(k))
k_means_test()
Part 2: Implementing a Gaussian mixture model
40 pts
Next, we will step beyond clustering and implement a complete Gaussian mixture model.
Complete the below implementation of GaussianMixtureModel
so that it can perform the following:
- Calculate the joint log probability of a given greyscale value. (5 points)
- Use expectation-maximization (EM) to train the model to represent the image as a mixture of Gaussians. (20 points) To initialize EM, set each component’s mean to the grayscale value of randomly chosen pixel and variance to 1, and the mixing coefficients to a uniform distribution. Note: there are packages that can run EM automagically, but please implement your own version of EM without using these extra packages. We’ve set the convergence condition for you in
GaussianMixtureModel.default_convergence()
: if the new likelihood is within 10% of the previous likelihood for 10 consecutive iterations, the model has converged. - Calculate the log likelihood of the trained model. (5 points)
- Segment the image according to the trained model. (5 points)
- Determine the best segmentation by iterating over model training and scoring, since EM isn’t guaranteed to converge to the global maximum. (5 points)
We have provided the necessary tests for this part.
When multiplying lots of probabilities in sequence, you can end up with a probability of zero due to underflow. To avoid this, you should calculate the log probabilities for the entire assignment.
The log form of the Gaussian probability of scalar value xx is:
where μμ is the mean and σσ is standard deviation.
You can calculate the sum of log probabilities by using scipy.misc.logsumexp()
. For example, logsumexp([-2,-3])
will return the same result as numpy.log(numpy.exp(-2)+numpy.exp(-3))
.
In other words, logsumexp(a, b) = log(e^a + e^b)
.
Rather than using lists of lists, you will find it much easier to store your data in numpy.array
arrays.
You can instantiate them using the command
matrix = numpy.zeros([rows, columns])
where rows
is the number of rows and columns
is the number of columns in your matrix. numpy.zeros()
generates a matrix of the specified size containing 0s at each row/column cell. You can access cells with the syntax matrix[2,3]
which will return the value in row 2 and column 3.
Warning: You may lose all marks for this part if your code runs for too long.
You will need to vectorize your code in this part. Specifically, the method train_model() needs to perform operations using numpy arrays, as does likelihood(), which calculates the log likelihood. These are time-sensitive operations and will be called over and over as you proceed with this assignment.
For the assignment, focus on vectorizing the following:
- The calculations for the Expectation step, where you calculate joint probabilities
- The calculations where you update your means, variances and mixing coefficients in the Maximization step.
Remember, these are fundamental operations and will be called a lot in the remainder of the assignment. So it is crucial you optimize these.
For the synthetic data test which we provide to check if your training is working, the set is too small and it won’t make a difference. But with the actual image that we use ahead, for-loops won’t do good. Vectorized code would take under 30 seconds to converge which would typically involve about 15-20 iterations with the convergence function we have here. Inefficient code that uses loops or iterates over each pixel value sequentially, will take hours to run. You don’t want to do that because:
- You won’t have that much time to test to your code.
- You won’t be getting marks. We will be capping the run time and kill anything that takes over 2 minutes for each iteration.
You will want to have your image pixel values as a one-dimensional array to perform these operations. We have provided a method to flatten the image for this purpose.
def default_convergence(prev_likelihood, new_likelihood, conv_ctr, conv_ctr_cap=10):
"""
Default condition for increasing
convergence counter:
new likelihood deviates less than 10%
from previous likelihood.
params:
prev_likelihood = float
new_likelihood = float
conv_ctr = int
conv_ctr_cap = int
returns:
conv_ctr = int
converged = boolean
"""
increase_convergence_ctr = (abs(prev_likelihood) * 0.9 <
abs(new_likelihood) <
abs(prev_likelihood) * 1.1)
if increase_convergence_ctr:
conv_ctr+=1
else:
conv_ctr =0
return conv_ctr, conv_ctr > conv_ctr_cap
from random import randint
import math
from scipy.misc import logsumexp
class GaussianMixtureModel:
"""
A Gaussian mixture model
to represent a provided
grayscale image.
"""
def __init__(self, image_matrix, num_components, means=None):
"""
Initialize a Gaussian mixture model.
params:
image_matrix = (grayscale) numpy.nparray[numpy.nparray[float]]
num_components = int
"""
self.image_matrix = image_matrix
self.num_components = num_components
if(means is None):
self.means = [0]*num_components
else:
self.means = means
self.variances = [0]*num_components
self.mixing_coefficients = [0]*num_components
def joint_prob(self, val):
"""Calculate the joint
log probability of a greyscale
value within the image.
params:
val = float
returns:
joint_prob = float
"""
# TODO: finish this
raise NotImplementedError()
return joint_prob
def initialize_training(self):
"""
Initialize the training
process by setting each
component mean to a random
pixel's value (without replacement),
each component variance to 1, and
each component mixing coefficient
to a uniform value
(e.g. 4 components -> [0.25,0.25,0.25,0.25]).
NOTE: this should be called before
train_model() in order for tests
to execute correctly.
"""
# TODO: finish this
raise NotImplementedError()
def train_model(self, convergence_function=default_convergence):
"""
Train the mixture model
using the expectation-maximization
algorithm. Since each Gaussian is
a combination of mean and variance,
this will fill self.means and
self.variances, plus
self.mixing_coefficients, with
the values that maximize
the overall model likelihood.
params:
convergence_function = function that returns True if convergence is reached
"""
# TODO: finish this
raise NotImplementedError()
def segment(self):
"""
Using the trained model,
segment the image matrix into
the pre-specified number of
components. Returns the original
image matrix with the each
pixel's intensity replaced
with its max-likelihood
component mean.
returns:
segment = numpy.ndarray[numpy.ndarray[float]]
"""
# TODO: finish this
raise NotImplementedError()
return segment
def likelihood(self):
"""Assign a log
likelihood to the trained
model based on the following
formula for posterior probability:
ln(Pr(X | mixing, mean, stdev)) = sum((n=1 to N),ln(sum((k=1 to K), mixing_k * N(x_n | mean_k, stdev_k) )))
returns:
log_likelihood = float
"""
# TODO: finish this
raise NotImplementedError()
return log_likelihood
def best_segment(self, iters):
"""Determine the best segmentation
of the image by repeatedly
training the model and
calculating its likelihood.
Return the segment with the
highest likelihood.
params:
iters = int
returns:
segment = numpy.ndarray[numpy.ndarray[float]]
"""
# finish this
raise NotImplementedError()
return segment
def gmm_likelihood_test():
"""Testing the GMM method
for calculating the overall
model probability.
Should return -364370.
returns:
likelihood = float
"""
image_file = 'images/party_spock.png'
image_matrix = image_to_matrix(image_file)
num_components = 5
gmm = GaussianMixtureModel(image_matrix, num_components)
gmm.initialize_training()
gmm.means = [0.4627451, 0.10196079, 0.027450981, 0.011764706, 0.1254902]
likelihood = gmm.likelihood()
return likelihood
gmm_likelihood_test()
def gmm_joint_prob_test():
"""Testing the GMM method
for calculating the joint
log probability of a given point.
Should return -0.98196.
returns:
joint_prob = float
"""
image_file = 'images/party_spock.png'
image_matrix = image_to_matrix(image_file)
num_components = 5
gmm = GaussianMixtureModel(image_matrix, num_components)
gmm.initialize_training()
gmm.means = [0.4627451, 0.10196079, 0.027450981, 0.011764706, 0.1254902]
test_val = 0.4627451
joint_prob = gmm.joint_prob(0.4627451)
return joint_prob
gmm_joint_prob_test()
def generate_test_mixture(data_size, means, variances, mixing_coefficients):
"""
Generate synthetic test
data for a GMM based on
fixed means, variances and
mixing coefficients.
params:
data_size = (int)
means = [float]
variances = [float]
mixing_coefficients = [float]
returns:
data = np.array[float]
"""
data = np.zeros(data_size).flatten()
indices = np.random.choice( len(means), len(data), p=mixing_coefficients)
for i in range(len(indices)):
data[i] = np.random.normal(means[indices[i]], variances[indices[i]])
return np.array([data])
def gmm_train_test():
"""Test the training
procedure for GMM using
synthetic data.
returns:
gmm = GaussianMixtureModel
"""
print( 'Synthetic example with 2 means')
num_components = 2
data_range = (1,1000)
actual_means = [2, 4]
actual_variances = [1]*num_components
actual_mixing = [.5]*num_components
dataset_1 = generate_test_mixture(data_range, actual_means, actual_variances, actual_mixing)
gmm = GaussianMixtureModel(dataset_1, num_components)
gmm.initialize_training()
# start off with faulty means
gmm.means = [1,3]
initial_likelihood = gmm.likelihood()
gmm.train_model()
final_likelihood = gmm.likelihood()
likelihood_difference = final_likelihood - initial_likelihood
likelihood_thresh = 250
if(likelihood_difference >= likelihood_thresh):
print('Congrats! Your model\'s log likelihood improved by at least %d.'%(likelihood_thresh))
print( 'Synthetic example with 4 means:')
num_components = 4
actual_means = [2,4,6,8]
actual_variances = [1]*num_components
actual_mixing = [.25]*num_components
dataset_1 = generate_test_mixture(data_range,
actual_means, actual_variances, actual_mixing)
gmm = GaussianMixtureModel(dataset_1, num_components)
gmm.initialize_training()
# start off with faulty means
gmm.means = [1,3,5,9]
initial_likelihood = gmm.likelihood()
gmm.train_model()
final_likelihood = gmm.likelihood()
# compare likelihoods
likelihood_difference = final_likelihood - initial_likelihood
likelihood_thresh = 200
if(likelihood_difference >= likelihood_thresh):
print('Congrats! Your model\'s log likelihood improved by at least %d.'%(likelihood_thresh))
return gmm
gmm_train_test()
def gmm_segment_test():
"""
Apply the trained GMM
to unsegmented image and
generate a segmented image.
returns:
segmented_matrix = numpy.ndarray[numpy.ndarray[float]]
"""
image_file = 'images/party_spock.png'
image_matrix = image_to_matrix(image_file)
num_components = 3
gmm = GaussianMixtureModel(image_matrix, num_components)
gmm.initialize_training()
gmm.train_model()
segment = gmm.segment()
segment_num_components = len(np.unique(segment))
if(segment_num_components == num_components):
print('Congrats! Your segmentation produced an image '+
'with the correct number of components.')
return segment
def gmm_best_segment_test():
"""
Calculate the best segment
generated by the GMM and
compare the subsequent likelihood
of a reference segmentation.
Note: this test will take a while
to run.
returns:
best_seg = np.ndarray[np.ndarray[float]]
"""
image_file = 'images/party_spock.png'
image_matrix = image_to_matrix(image_file)
image_matrix_flat = flatten_image_matrix(image_matrix)
num_components = 3
gmm = GaussianMixtureModel(image_matrix, num_components)
gmm.initialize_training()
iters = 10
# generate best segment from 10 iterations
# and extract its likelihood
best_seg = gmm.best_segment(iters)
matrix_to_image(best_seg, 'images/best_segment_spock.png')
best_likelihood = gmm.likelihood()
# extract likelihood from reference image
ref_image_file = 'images/party_spock%d_baseline.png'%(num_components)
ref_image = image_to_matrix(ref_image_file, grays=True)
gmm_ref = GaussianMixtureModel(ref_image, num_components)
ref_vals = ref_image.flatten()
ref_means = list(set(ref_vals))
ref_variances = [0]*num_components
ref_mixing = [0]*num_components
for i in range(num_components):
relevant_vals = ref_vals[ref_vals==ref_means[i]]
ref_mixing[i] = float(len(relevant_vals)) / float(len(ref_vals))
ref_variances[i] = np.mean((image_matrix_flat[ref_vals==ref_means[i]] - ref_means[i])**2)
gmm_ref.means = ref_means
gmm_ref.variances = ref_variances
gmm_ref.mixing_coefficients = ref_mixing
ref_likelihood = gmm_ref.likelihood()
# compare best likelihood and reference likelihood
likelihood_diff = best_likelihood - ref_likelihood
likelihood_thresh = 1e4
if(likelihood_diff >= likelihood_thresh):
print('Congrats! Your image segmentation is an improvement over ' +
'the baseline by at least %.2f.'%(likelihood_thresh))
return best_seg
best_segment = gmm_best_segment_test()
matrix_to_image(best_segment, 'best_segment.png')
Part 3: Model experimentation
20 points
We’ll now experiment with a few methods for improving GMM performance.
3a: Improved initialization
12.5 points
To run EM in our baseline Gaussian mixture model, we use random initialization to determine the initial values for our component means. We can do better than this!
Fill in the below GaussianMixtureModelImproved.initialize_training()
with an improvement in component initialization. Please don’t use any external packages for anything other than basic calculations (e.g. scipy.misc.logsumexp
). Note that your improvement might significantly slow down runtime, although we don’t expect you to spend more than 10 minutes on initialization.
Hint: you’ll probably want an unsupervised learning method to initialize your component means. Clustering is one useful example of unsupervised learning, and you may want to look at 1-dimensional methods such as Jenks natural breaks optimization.
class GaussianMixtureModelImproved(GaussianMixtureModel):
"""A Gaussian mixture model
for a provided grayscale image,
with improved training
performance."""
def initialize_training(self):
"""
Initialize the training
process by setting each
component mean using some algorithm that you think might give better means to start with,
each component variance to 1, and
each component mixing coefficient
to a uniform value
(e.g. 4 components -> [0.25,0.25,0.25,0.25]).
[You can feel free to modify the variance and mixing coefficient initializations too if that works well.]
"""
# TODO: finish this
raise NotImplementedError()
def gmm_improvement_test():
"""
Tests whether the new mixture
model is actually an improvement
over the previous one: if the
new model has a higher likelihood
than the previous model for the
provided initial means.
returns:
original_segment = numpy.ndarray[numpy.ndarray[float]]
improved_segment = numpy.ndarray[numpy.ndarray[float]]
"""
image_file = 'images/party_spock.png'
image_matrix = image_to_matrix(image_file)
num_components = 3
initial_means = [0.4627451, 0.20392157, 0.36078432]
# first train original model with fixed means
gmm = GaussianMixtureModel(image_matrix, num_components)
gmm.initialize_training()
gmm.means = np.copy(initial_means)
gmm.train_model()
original_segment = gmm.segment()
original_likelihood = gmm.likelihood()
# then train improved model
gmm_improved = GaussianMixtureModelImproved(image_matrix, num_components)
gmm_improved.initialize_training()
gmm_improved.train_model()
improved_segment = gmm_improved.segment()
improved_likelihood = gmm_improved.likelihood()
# then calculate likelihood difference
diff_thresh = 1e3
likelihood_diff = improved_likelihood - original_likelihood
if(likelihood_diff >= diff_thresh):
print('Congrats! Improved model scores a likelihood that was at ' +
'least %d higher than the original model.'%(diff_thresh))
return original_segment, improved_segment
best_segment, best_segment_improved = gmm_improvement_test()
matrix_to_image(best_segment, 'best_segment_original.png')
matrix_to_image(best_segment_improved, 'best_segment_improved.png')
3b: Convergence condition
7.5 points
You might be skeptical of the convergence criterion we’ve provided in default_convergence()
. To test out another convergence condition, implement new_convergence_condition()
to return true if all the new model parameters (means, variances, and mixing coefficients) are within 10% of the previous variables for 10 consecutive iterations. This will mean re-implementing train_model()
, which you will also do below in GaussianMixtureModelConvergence
.
You can compare the two convergence functions in convergence_condition_test()
.
def new_convergence_function(previous_variables, new_variables, conv_ctr, conv_ctr_cap=10):
"""
Convergence function
based on parameters:
when all variables vary by
less than 10% from the previous
iteration's variables, increase
the convergence counter.
params:
previous_variables = [numpy.ndarray[float]] containing [means, variances, mixing_coefficients]
new_variables = [numpy.ndarray[float]] containing [means, variances, mixing_coefficients]
conv_ctr = int
conv_ctr_cap = int
return:
conv_ctr = int
converged = boolean
"""
# TODO: finish this function
raise NotImplementedError()
return conv_ctr, converged
class GaussianMixtureModelConvergence(GaussianMixtureModel):
"""
Class to test the
new convergence function
in the same GMM model as
before.
"""
def train_model(self, convergence_function=new_convergence_function):
# TODO: finish this function
raise NotImplementedError()
def convergence_condition_test():
"""
Compare the performance of
the default convergence function
with the new convergence function.
return:
default_convergence_likelihood = float
new_convergence_likelihood = float
"""
image_file = 'images/party_spock.png'
image_matrix = image_to_matrix(image_file)
num_components = 3
initial_means = [0.4627451, 0.10196079, 0.027450981]
# first test original model
gmm = GaussianMixtureModel(image_matrix, num_components)
gmm.initialize_training()
gmm.means = np.copy(initial_means)
gmm.train_model()
default_convergence_likelihood = gmm.likelihood()
# now test new convergence model
gmm_new = GaussianMixtureModelConvergence(image_matrix, num_components)
gmm_new.initialize_training()
gmm_new.means = np.copy(initial_means)
gmm_new.train_model()
new_convergence_likelihood = gmm_new.likelihood()
# test convergence difference
convergence_diff = new_convergence_likelihood - default_convergence_likelihood
convergence_thresh = 200
if(convergence_diff >= convergence_thresh):
print('Congrats! The likelihood difference between the original '
+ 'and the new convergence models should be at least %.2f'%(convergence_thresh))
return default_convergence_likelihood, new_convergence_likelihood
Part 4: Bayesian information criterion
20 points
In our previous solutions, our only criterion for choosing a model was whether it maximizes the posterior likelihood regardless of how many parameters this requires. As a result, the “best” model may simply be the model with the most parameters, which would be overfit to the training data.
To avoid overfitting, we can use the Bayesian information criterion (a.k.a. BIC) which penalizes models based on the number of parameters they use. In the case of the Gaussian mixture model, this is equal to the number of components times the number of variables per component (mean, variance and mixing coefficient) = 3*components.
4a: Implement BIC
5 points
Implement bayes_info_criterion()
to calculate the BIC of a trained GaussianMixtureModel
.
def bayes_info_criterion(gmm):
# TODO: finish this function
raise NotImplementedError()
return BIC
def bayes_info_test():
"""
Test for your
implementation of
BIC on fixed GMM values.
Should be about 727045.
returns:
BIC = float
"""
image_file = 'images/party_spock.png'
image_matrix = image_to_matrix(image_file)
num_components = 3
initial_means = [0.4627451, 0.10196079, 0.027450981]
gmm = GaussianMixtureModel(image_matrix, num_components)
gmm.initialize_training()
gmm.means = np.copy(initial_means)
BIC = bayes_info_criterion(gmm)
return BIC
4b: Test BIC
15 points
Now implement BIC_model_test()
, in which you will use the BIC and likelihood to determine the optimal number of components in the Party Spock image. Use the original GaussianMixtureModel
for your models. Iterate from k=2 to k=7 and use the provided means to train a model that minimizes its BIC and a model that maximizes its likelihood.
Then, fill out BIC_likelihood_question()
to return the number of components in both the min-BIC and the max-likelihood model.
def BIC_likelihood_model_test():
"""Test to compare the
models with the lowest BIC
and the highest likelihood.
returns:
min_BIC_model = GaussianMixtureModel
max_likelihood_model = GaussianMixtureModel
"""
# TODO: finish this method
raise NotImplementedError()
comp_means = [
[0.023529412, 0.1254902],
[0.023529412, 0.1254902, 0.20392157],
[0.023529412, 0.1254902, 0.20392157, 0.36078432],
[0.023529412, 0.1254902, 0.20392157, 0.36078432, 0.59215689],
[0.023529412, 0.1254902, 0.20392157, 0.36078432, 0.59215689, 0.71372563],
[0.023529412, 0.1254902, 0.20392157, 0.36078432, 0.59215689, 0.71372563, 0.964706]
]
return min_BIC_model, max_likelihood_model
def BIC_likelihood_question():
"""
Choose the best number of
components for each metric
(min BIC and maximum likelihood).
returns:
pairs = dict
"""
# TODO: fill in bic and likelihood
raise NotImplementedError()
bic = 0
likelihood = 0
pairs = {
'BIC' : bic,
'likelihood' : likelihood
}
return pairs
Bonus
2 points
A crucial part of machine learning is working with very large datasets. As stated before, using for loops over these datasets will result in the code taking many hours, or even several days, to run. Even vectorization can take time if not done properly, and as such there are certain tricks you can perform to get your code to run as fast as physically possible.
For this part of the assignment, you will need to implement part of a k-Means algorithm. You are given two arrays – points_array
with X n-dimensional points, and means_array
with Y n-dimensional points. You will need to return an X x Y array containing the distances from each point in points_array
to each point in means_array
.
Your code will be tested using two very large arrays, against our reference implementation, which was designed by Murtaza Dhuliawala. Thus, you’ll be competing against the Head TA!
If your implementation returns the correct answer in time comparable to Murtaza’s implementation, you will receive 2 bonus points.
For reference, the data used is in the order of thousands of points and hundreds of means, and Bonnie automatically kills a grading script that takes more than 500MB. So please test accordingly locally before submitting, as you may lose a submission for an inefficient solution. It is very likely that you could run out of memory if your implementation is inefficient.
def bonus(points_array, means_array):
"""
Return the distance from every point in points_array
to every point in means_array.
returns:
dists = numpy array of float
"""
# TODO: fill in the bonus function
# REMOVE THE LINE BELOW IF ATTEMPTING BONUS
raise NotImplementedError()
return dists
def bonus_test():
points = np.array([[ 0.9059608,0.67550357,0.13525533],[ 0.23656114,0.63624466,0.3606615 ],[ 0.91163215,0.24431103,0.33318504],[ 0.25209736,0.24600123,0.42392935],[ 0.62799146,0.04520208,0.55232494],[ 0.5588561, 0.06397713,0.53465371],[ 0.82530045,0.62811624,0.79672349],[ 0.50048147,0.13215356,0.54517893],[ 0.84725662,0.71085917,0.61111105],[ 0.25236734,0.25951904,0.70239158]])
means = np.array([[ 0.39874413,0.47440682,0.86140829],[ 0.05671347,0.26599323,0.33577454],[ 0.7969679, 0.44920099,0.37978416],[ 0.45428452,0.51414022,0.21209852],[ 0.7112214, 0.94906158,0.25496493]])
expected_answer = np.array([[ 0.90829883,0.9639127, 0.35055193,0.48575144,0.35649377],[ 0.55067427,0.41237201,0.59110637,0.29048911,0.57821151],[ 0.77137409,0.8551975, 0.23937264,0.54464354,0.73685561],[ 0.51484192,0.21528078,0.58320052,0.39705222,0.85652654],[ 0.57645778,0.64961631,0.47067874,0.60483973,0.95515036],[ 0.54850426,0.57663736,0.47862222,0.56358129,0.94064631],[ 0.45799673,0.966609,0.45458971,0.70173336,0.63993928],[ 0.47695785,0.50861901,0.46451987,0.50891112,0.89217387],[ 0.56543953,0.94798437,0.35285421,0.59357932,0.4495398 ],[ 0.30477736,0.41560848,0.66079087,0.58820896,0.94138546]])
if np.allclose(expected_answer,bonus(points,means),1e-7):
print 'You returned the correct distances.'
else:
print 'Your distance calculation is incorrect.'
bonus_test()
You’re done with the requirements! Hope you have completed the functions in the mixture_models.py file and tested everything!