Sale!

Solved CSci 4270 and 6270 Computational Vision Homework 1 1. (20 points) Write a script that takes a single image

$30.00 $18.00

Original Work ?

Download Details:

  • Name: hw1-znkl8g.zip
  • Type: zip
  • Size: 15.27 MB

Category: Tags: , , You will Instantly receive a download link upon Payment||Click Original Work Button for Custom work

Description

5/5 - (1 vote)

1. (20 points) Write a script that takes a single image and creates a checkerboard pattern from
it. The command-line will look like
python p1_checkerboard.py im out_im m n
Input image im should be cropped to make it square and resized to make it m × m. Next, it
should be formed into a 2×2 grid of m×m images. The 0,0 entry for the grid should show the
downsized image, and the 1,1, entry for the grid should show the image upside down. Then
the 0,1 entry should show the 0,0 image with the colors of the image inverted so that each
color intensity value p is replaced by 255−p, and the 1,0 entry should show the 1,1 entry with
the colors inverted. Finally, replicate the 2×2 grid of images to make it 2n × 2n, generating
a final image having 2nm × 2nm pixels. Save the result to out_im. Use NumPy functions
concatenate and tile to create the final image. See discussion of np.tile below.
Here is an example command line
python p1_checkerboard.py mountain3.jpg p1_mountain3_checker_out.jpg 120 4
and desired output
Image mountain3.jpg cropped at (0, 420) and (1079, 1499)
Resized from (1080, 1080, 3) to (120, 120, 3)
The checkerboard with dimensions 960 X 960 was output to p1_mountain3_checker_out.jpg
2. (20 points) Do you recognize Abraham Lincoln in this picture?
2
If you don’t you might be able to if you squint or look from far away. Try it now. In this
problem you will write a script to generate such a blocky, scaled-down image. The idea is to
form the block image from the input image, which you will read as a grayscale: Do this in
two steps:
(a) Compute a “downsized image” where each pixel represents the average intensity across
a region of the input image.
(b) Generate the larger block image by expanding each pixel in the downsized image to a
block of pixels having the same intensity.
(c) Generate a binary image version of the downsized image and make a block version of it
as well.
The input to your script will be an image and three integers:
python p2_block.py img m n b
The values m and n are the number of rows and columns, respectively, in the downsized
image, while b is the size of the blocks that replace each downsized pixel. The resulting image
should have mb rows and nb columns.
When creating the downsized image, start by generating two scale factors, sm and sn. If the
input image has M rows and N columns, then we have sm = M/m and sn = N/n. (Notice
that these will be float values.) The pixel value at each location (i, j) of the downsized image
will be the (float) average intensity of the region from the original gray scale image whose row
values include round(i ∗ sm) up to (but not including) round((i + 1) ∗ sm) and whose column
values include round(j ∗ sn) up to (but not including) round((j + 1) ∗ sn).
You will then create a second downsized image that will be a binary version of the first
downsized image. The threshold for the image will be decided such that half the pixels are
3
0’s and half the pixels are 255. More precisely, any pixel whose value (in the downsized
image) is greater than or equal to the median value (NumPy has a median function) should
be 255 and anything else should be 0. Note that this means the averages should be kept
as floating point values before before forming the binary image.
Once you have created both of these downsized images, you can easily upsample them to
create the block images. Before doing this, convert the average gray scale image to integer
by rounding.
The gray scale block image should be output to a file whose name is the same as the input
file, but with _g appended to the name just before the file extension. The binary block image
should be output to a file whose name is the same as the input file, but with _b appended to
the name just before the file extension.
Text output should include the following:
• The size of the downsized images.
• The size of the block images.
• The average output intensity (as float values accurate to two decimals) at the following
downsized pixel locations:
– (m // 4, n // 4)
– (m // 4, 3n // 4)
– (3m // 4, n // 4)
– (3m // 4, 3n // 4)
• The threshold for the binary image output, accurate to two decimals.
• The names of the output images.
Here is an example.
python p2_block.py lincoln1.jpg 25 18 15
which produces the output
Downsized images are (25, 18)
Block images are (375, 270)
Average intensity at (6, 4) is 59.21
Average intensity at (6, 13) is 55.46
Average intensity at (18, 4) is 158.30
Average intensity at (18, 13) is 35.33
Binary threshold: 134.68
Wrote image lincoln1_g.jpg
Wrote image lincoln1_b.jpg
Important Notes:
(a) To be sure you are consistent with our output, convert the input image to grayscale as
you read it using cv2.imread, i.e.
im = cv2.imread(sys.argv[1], cv2.IMREAD_GRAYSCALE)
4
(b) You are only allowed to use for loops over the pixel indices of the downsized images
(i.e. the 25×18 pixel image in the above example). In addition, avoid using for loops
when converting to a binary image.
(c) Be careful with the types of the values stored in your image arrays. Internal computations
should use np.float32 or np.float64 whereas output images should use np.uint8.
3. (20 points) Image manipulation software tools include methods of introducing shading in
images, for example, darkening from the left or right, top or bottom, or even from the center.
Examples are shown in the following figure, where the image darkens as we look from left to
right in the first example and the image darkens as we look from the center to the sides or
corners of the image in the second example.
The problem here is to take an input image I, create a shaded image Is, and output the input
image and its shaded version (I and Is) side-by-side in a single image file. Supposing I has M
rows and N columns, the central issue is to form an M × N array of multipliers with values
in the range [0, 1] and multiply this by each channel of I. For example, values scaling from 0
in column 0 to 1 in column N − 1, with i/(N − 1) in column i, produce an image that is dark
on the left and bright on the right (opposite the first example above). This M × N array is
called an alpha mask, or mask.
Write a Python program that accomplishes this. The command-line should run as
python p3_shade.py in_img out_img dir
where dir can take on one of five values, left, top, right, bottom, center. (If dir is
not one of these values, do nothing. We will not test this case.) The value of dir indicates the
side or corner of the image where the shading starts. In all cases the value of the multiplier
should be proportional to 1 −d(r, c), where d(r, c) is the distance from pixel (r, c) to the start
of the shading, normalized so that the maximum distance is 1. For example, if the image is
7 × 5 and dir == ’right’ then the multipliers should be
[[ 0. , 0.25, 0.5 , 0.75, 1. ],
[ 0. , 0.25, 0.5 , 0.75, 1. ],
[ 0. , 0.25, 0.5 , 0.75, 1. ],
5
[ 0. , 0.25, 0.5 , 0.75, 1. ],
[ 0. , 0.25, 0.5 , 0.75, 1. ],
[ 0. , 0.25, 0.5 , 0.75, 1. ],
[ 0. , 0.25, 0.5 , 0.75, 1. ]])
whereas if the image is 5 × 7 and dir == ’center’ then the multipliers should be
[[0. 0.216 0.38 0.445 0.38 0.216 0. ]
[0.123 0.38 0.608 0.723 0.608 0.38 0.123]
[0.168 0.445 0.723 1. 0.723 0.445 0.168]
[0.123 0.38 0.608 0.723 0.608 0.38 0.123]
[0. 0.216 0.38 0.445 0.38 0.216 0. ]]
(I used np.set_printoptions(precision = 3) to generate this formatting.) In addition
to outputing the final image (the combination of original and shaded images), the program
should output, accurate to three decimal places, nine values of the multiplier. These are
at the Cartesian product of rows (0, M//2, M − 1) and columns (0, N//2, N − 1) (where //
indicates integer division). For example, my solution’s output for image mountain2.jpg with
M = 1080 and N = 1920 and direction ’center’ is
(0,0) 0.000
(0,960) 0.510
(0,1919) 0.001
(540,0) 0.128
(540,960) 1.000
(540,1919) 0.129
(1079,0) 0.000
(1079,960) 0.511
(1079,1919) 0.001
These values are the only printed output required from your program.
Important Notes:
(a) Start by generating a 2d array of pixel distances in the row dimension and a second 2d
array of pixel distances in the column dimension, then combine these using NumPy operators and universal functions, ending with normalization so that the maximum distance
is 1. The generation of distance arrays starts with np.arange to create one distance
dimension and then extends it to two dimensions np.tile. For example,
>>> import numpy as np
>>> a = np.arange(5)
>>> np.tile(a, (3,1))
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
After you have the distance array, simply subtract the array from 1 to get the multipliers.
(b) Please do not use np.fromfunction to generate the multiplier array because it is essentially the same as nested for loops over the image with a Python call at each location.
6
(c) Please use (M // 2, N // 2) as the center pixel of the image.
4. (20 points) How do you decide how similar two images are to each other? This question is
at the heart of the recognition problem that pervades computer vision, and therefore it has
been studied for years. Here we will consider a simple method that is a precursor to more
sophisticated methods we will see later in the semester.
Your script will read in each image in a directory. It will reduce each image to a vector of
length 3n
2
. It will then find the distance between each pair of images. For each image, in
the order produced by sort, it must find the closest image, and then it must output the two
images and the distance, accurate to 3 decimal places.
To encode an image in a vector — often called a descriptor vector — we divide the image into
n × n regions that are equal in size (perhaps differing by one pixel). Use the same method
as you did in Problem 2 when creating the downsized image. In each region, compute the
average red, green and blue intensities. Concatenate these in row-major order to form the
descriptor. In other words, if ri,j , gi,j , bi,j are the average RGB values from region i, j (here
i represents rows and j represents columns), then the vector should be formed as
r0,0, g0,0, b0,0, r0,1, g0,1, b0,1, . . . r0,n−1, g0,n−1, b0,n−1, r1,0, g1,0, b1,0, . . .
Finally, normalize this vector (use np.linalg.norm) so that its magnitude is 1.0, and then
scale all values by 100. The normalization step is intended to correct for brightness differences between images, while the 100 scaling converts to percentages to make the values more
intuitive. Call the result the RGB descriptor. Output the final values of r0,0, g0,0, b0,0 and
rn−1,n−1, gn−1,n−1, bn−1,n−1 for the first image.
The command line for your program should be
python p4_closest.py img-folder n
where img-folder is the file folder containing the images (only consider files whose lower-case
extension is .jpg) and n is the number of regions in the row and column dimensions. Each
time the program is run, it should use both the RGB and the L*a*b descriptors, generating
two sets of output. All numerical output should be accurate to 2 decimal points. Here is an
example based on four images that will be distributed with the assignment.
Nearest distances
First region: 20.281 21.207 22.185
Last region: 6.762 6.497 6.520
central_park.jpg to skyline.jpg: 21.65
hop.jpg to times_square.jpg: 24.17
skyline.jpg to central_park.jpg: 21.65
times_square.jpg to hop.jpg: 24.17
In this example, there is symmetry in the closest distances. This will not always be the case.