## Description

P1: De-beeper [5 points]

1. x.wav is a speech signal contaminated by a beep sound. As I haven’t taught you guys how to

properly do speech enhancement yet, you’re not supposed to know a machine learning-based

solution to this problem (don’t worry I’ll cover it soon). Instead, you did learn how to do

STFT, so I want you to at least manually erase the beep sound from this signal to recover

the clean speech source. If you have a perfect pitch, you might be able to know the pitch of

the beep, but I assume that you don’t. That’s why you have to see the spectrogram to find

out the beep frequency.

2. First off, create a DFT matrix F using the first equation in M02-S12. You’ll of course create

a N × N complex matrix, but if you see its real and imaginary versions separately, you’ll see

something like the ones in M02-S14 (the ones in the slide are 20 × 20). Determine your N,

which is the frame size shown in M02-S17. For example, since the signal’s sampling rate is

16kHz, if your N is 1600, your frequency resolution (the range a Fourier coefficient covers)

will be 10Hz. If your N is large, you’ll get finer frequency resolution and vice versa. Feel free

to try out a few different choices, but N = 1600 must be enough.

3. Prepare your data matrix X. You extract the first frame of N samples from the input signal,

and apply a Hann window (or any other windows that can overlap-and-add to one). What

that means is that from the definition of Hann window1

, you create a window of size N and

1

I allow you to use a pre-defined scipy function to compute this window, but

I encourage you to go ahead and implement one based on the simple equation:

https://en.wikipedia.org/wiki/Window function#Hann and Hamming windows

1

element-wise multiply the window and your N audio samples. Place it as your first column

vector of the data matrix X. Move by N/2 samples. Extract another frame of N samples

and apply the window. This goes to the second column vector of X. Do it for your third

frame (which should start from (N + 1)’th sample), and so on. Since you moved just by the

half of the frame size, your frames are overlapping each other by 50%.

4. Apply the DFT matrix from P1.2 to your data matrix, i.e., F X. This is your spectrogram

with complex values. See how it looks like (by taking magnitudes and plotting). Locate two

thin horizontal lines. They are from the beep sound. Note that due to the conjugacy your

spectrogram is mirrored vertically. The bottom half is a mirrored version of the top half in

terms of their magnitudes, although their imaginary parts are with a different sign (complex

conjugate). The spectrograms you’ve seen in class are the top half of a spectrogram, because

the bottom half has no useful information (except for the flipped phase). This is why you

“see” two beeper lines in your spectrogram, although you hear just one beep tone. Anyway,

locate them, and make the entire row zero.

5. Apply the inverse DFT matrix, which you can also create by using the equation in M2 S12.

Let’s call this F

∗

. Since it’s the inverse transform, F

∗F ≈ I (you can check it, although the

off diagonals might be a very small number rather than zero). You apply this matrix to your

spectrogram, which is free from the beep tones, to get back to the recovered version of your

data matrix, Xˆ . In theory this should give you a real-valued matrix, but you’ll still see some

imaginary parts with a very small value. Ignore them by just taking the real part. Reverse

the procedure in P1.3 to get the time domain signal. Basically it must be a procedure that

transpose every column vector of Xˆ and overlap-and-add the right half of t-th row vector

with the left half of the (t + 1)-th row vector and so on. Listen to the signal to check if the

beep tones are gone.

6. Submit your code and the de-beepped audio file, by embedding it into your notebook.

HW2 P2: Parallax [5 points]

1. You live in a planet far away from the earth. Your solar system belongs to a galaxy, which

is about to merge with another galaxy (it is not rare in the outer space, but don’t worry, the

merger takes a few billions of years). Anyhow, because of this merger, in the deep sky you

see lots of stars from your galaxy as well as the other stars in the other neighboring galaxy.

Of course you don’t know which one is from which galaxy though.

2. You are going to use a technique called “parallax” to solve this problem. It’s actually very

similar to the computer vision algorithm called “stereo matching,” where stereophonic cameras

find out the 3D depth information from the visual scene. That’s actually why we humans can

recognize the distance of a visual object (we have two eyes). See Figure 1 for an example.

3. Let’s get back to your remote planet. In your planet, parallax works by taking a picture of

the deep sky in June and another one in December (yes, you have 12 months there, too). If

you take a picture of the deep sky, you see the stars nearby (i.e., the ones in your galaxy)

changes their position much more in the two pictures, while the stars far way (i.e., the ones

in the neighboring galaxy) change their position less. See Figure 2 and 3.

2

Left image Right image

Figure 1: The tree is closer than the mountain. So, from the left camera, the tree is located on

the right hand side, while the right camera captures it on the left hand side. On the contrary, the

mountain in the back does not have this disparity.

Your planet in June

Your planet in Dec.

Your sun

A star in your galaxy

The other stars

in the neighboring galaxy

Figure 2: You take two pictures of the same area of the sky in June and December, respectively.

Because of the pretty big movement of your planet due to its revolution, you can see that the

close-by stars oscillate more in the two pictures than the far-away ones, just like the tree and the

mountain in Figure 1.

3

June Dec

Figure 3: The oscillation of the close star (green) in the two pictures. Note that the other stars

(blue) don’t move much.

4. june.png and december.png are the two pictures you took for this parallax project. In

theory, you need to apply a computer vision technique, called “non-maximum suppression,”

with which you can identify the x-y coordinates of all the stars in the two pictures. In theory,

for each of the stars in june.png, you look for its position in december.png by scanning a

star in the same row (because the stars always move to right). But, I’m going to save this

part for Prof. Crandall so that he can use this topic in his homework assignment (if he wants

to).

5. Instead, I provide you with the (x, y) coordinates of all the stars in the two pictures. june.mat

and december.mat contain the positions of the stars in the two pictures. Each row has two

coordinates, x-coordinate and y-coordinate, and there are 2,700 such rows in each matrix,

each of which is for a particular star.

6. Take the first column vector of each matrix (i.e. the x-coordinates of the stars in June and

December, respectively). If you subtract the June vector from the December vector, the

difference defines disparity, or the amount of oscillation, which is a vector of 2,700 elements.

Draw a histogram out of these disparity values to gauge if you can see two clusters there.

7. Perform k-means clustering on this disparity dataset. Find out the cluster means and report

the values. Which one do you think corresponds to the stars in your galaxy and which is for

the other galaxy? Why do you think so? Justify your answer in the report.

8. Note that you’re not supposed to use someone else’ implementation. Write up your own code

for k-means clustering.

9. One tip for implementing k-means clustering from scratch is to make sure that each cluster

contains at least one sample throughout the iterative updates. It’s to avoid any trivial solution,

e.g., the algorithm learns a gigantic cluster that contains everything.

HW2 P3: GMM for Parallax [5 points]

1. Implement your own EM algorithm for GMM and propose an alternative solution to the

previous parallax problem. Report the estimated means, variances, and prior weights. Explain

which one you prefer between the k-means solution and the GMM results. Justify your answer.

4