Description
1. (20 points) Explore the GrabCut function in OpenCV. See
https://docs.opencv.org/3.1.0/d8/d83/tutorial_py_grabcut.html
In particular, demonstrate an example image of your own choosing where GrabCut works
well and an example image where it works poorly. (These images must be ones that you
nd or take and not images that someone else has worked on with GrabCuts.) A big part of
your eort is determining the rectangle bounding the foreground and, within this rectangle,
which pixels are denitely foreground and which are denitely background. To do this, please
examine the image interactively (see the function show_with_pixel_values from Lecture 2
ex5_stretch_img.py). Then, record one or more rectangular regions within the object youd
like to be part of the foreground and one or more youd like be part of the background after
the GrabCut segmentation. Provide these to the GrabCuts function as part of the mask. Be
sure to discuss why you think GrabCuts succeeded or failed in each case. Also, be sure to
resize your image to a reasonable working dimension | say 500 to 600 pixels as the max
dimension | to make this practical.
Your Python code should take the original input image as an argument and a le containing
pixel coordinates dening rectangles in the image. Each line should contain four pixels dening
a rectangle. The rst rectangle|the \outer rectangle”|should be the bounding rectangular
mask on the object. The remaining rectangles | the \inner rectangles” | should all be
enclosed in this rectangle and should bound image locations that should denitely be inside
or denitely outside the segmented object. (You will need some way to distinguish.) I suggest,
for simplicitly, that you nd the coordinates of these rectangles by hand. I realize that this
is a bit clunky, but I don’t want you to spend your time writing any fancy user interaction
code.
Your write up should show the original image, the image with the rectangles draw overtop |
outer rectangle in one color, inner rectangles to be included in another, and inner rectangles
to be excluded in a third. Also show the resulting segmented image. Be sure to explain your
result.
2. (20 points) Apply k-means clustering to attempt segment the grass in the images provided
on Piazza. The \data” vectors input to K-means should be at least be a concatenation of the
pixel location values and the RGB values at each pixel. You might include other measures
such as the standard deviation of the R, G and B values over a pixel’s neighborhood so that
you capture some notion of the variation in intensity (e.g. solid green regions aren’t very
grass-like). You might have to scale your position, color or other measurements to give them
1
more or less in uence during k-means. Vary the value of k to see the eect | perhaps several
clusters together cover the grass.
Note that I don’t expect perfect segmentations. More sophisticated algorithms are needed
for this. Instead, I want you to develop an understanding of the use of k-means and of the
diculty of the segmentation problem.
In addition to your code, please include in your write-up a small selection of images from
the ones I provided demonstrating good and bad results. Include each original image, and
separately show the clusters drawn on the images. Include example images that indicate the
eect of varying k and, perhaps, that demonstrate the eect of including measures other than
color and position.
3. (60 points) For this problem and for HW 6 we are going to consider the problem of deter-
mining the dominant \background” class of a scene. The ve possibilities we will consider in
this example are grass, wheat eld, road, ocean and red carpet. Some of these are relatively
easy, but others are hard. A link to the images will be provided on the Piazza site.
In our lecture on detection we focused on the \HoG” | histogram of oriented gradients |
descriptor. Beyond this, many dierent types of descriptors have been invented. For this
problem you are going to implement a descriptor that combines location and color and uses
no gradient information whatsoever. One descriptor will be computed for the entirety of each
image. A series of SVM classiers that you train, one for each desired class, will then be
applied to the descriptor vector to make a decision.
Let’s start with the descriptor. Imagine that an image’s three color channels (R, G, B) are
represented as a 3D cube, where the x axis is the red color channel, the y axis is the green
color channel, and the z axis in the blue color channel. In this representation, a particular%