Homework 0 COMS 4771 solution

$29.99

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (4 votes)

Problem 1 (Values). Name one or two of your own personal, academic, or career values, and
explain how you hope machine learning can be of service to those values.
1
Problem 2 (Stu you must know). The course website https://www.cs.columbia.edu/~djhsu/
coms4771-f16/ has information about the course prerequisites, course requirements, academic rules
of conduct, and other information. You are required to understand this information and abide by
the rules of conduct, regardless of whether or not you can solve the following problems.
(a) True or false: I may share my homework write-up or code with another student as long as
(1) the write-up only contains solutions for at most half of the problems, (2) the code is at
most ve lines, and (3) we list each other as discussion partners on the submitted write-up.
(b) True or false: I may use any outside reference material to help me solve the homework
problems as long as I appropriately acknowledge these materials in the submitted write-up.
2
Problem 3 (More stu you should know). We’ll use the notation f : X ! Y to declare a function
f whose domain is the set X, and whose range is the set Y. For example, f : R ! R declares
a real-valued function over the real line. For a positive integer d, the d-dimensional vector space
called Euclidean space is denoted by Rd. For positive integers m and n, the space of mn matrices
over the real eld R is denoted by Rmn. Every matrix in Rmn can be regarded as a linear map
from Rn to Rm.
Let A;B 2 R22 be given by
A :=

1 2
2 4
#
; B :=

4 0
0 4
#
:
(\:=” is the notation used for \equals by de nition”.) Also let u := (2; 1) and v := (1; 2), which
are vectors in R2. Note that when we refer to vectors from Euclidean spaces in the context of
matrix-vector products, we always regard vectors (like u) as column vectors, and their transposes
(like u>) as row vectors:
u =

2
1
#
; u> =
h
2 1
i
:
(a) What is the rank of A?
(b) What is Au + Bv?
(c) What is u>Av?
(d) The (Euclidean) norm (or length) of a vector x = (x1; x2; : : : ; xd) 2 Rd is denoted by kxk2,
and is equal to
q
x2
1 + x2
2 +    + x2
d. What is kuk2?
(e) Let f : R2 ! R be the function de ned by
f(x) := x>(A + B)x :
The gradient of a real-valued function g : Rd ! R at a point z 2 Rd, denoted by rg(z), is
the vector  = (1; 2; : : : ; d) where
i := @
@xi
g(x)

x=z
for all i = 1; 2; : : : ; d :
What is rf(v)?
(f) The unit circle in R2 is the set of vectors in R2 with unit length, i.e., fx 2 R2 : kxk2 = 1g.
Which vector in the unit circle minimizes f (de ned above), and what is the value of f
evaluated at this vector? (Hint: think about eigenvectors.)
3
Problem 4 (Random stu you should know). A (discrete) probability space is a pair (
; P), where

is a (discrete) set called the sample space, and P :
! R is a real-valued function on
called
the probability distribution, which must satisfy P(!)  0 for all ! 2
, and
P
!2
P(!) = 1. An
event A is a subset of
, and the probability of A, denoted by P(A) (somewhat abusing notation),
is equal to
P
!2A P(!).
(a) A fair coin is tossed three times. Consider the three events:
• A: the outcome of the rst toss is heads.
• B: the outcome of the second toss is tails.
• C: the outcomes of all three tosses are the same.
• D: exactly one of the outcomes is heads.
Which of the following pairs of events are independent?
• A and B.
• A and C.
• A and D.
• C and D.
(b) A student applies to two schools: Trump University and Columbia University. The student
has a probability of 0:5 of being accepted to Trump, and a probability of 0:3 of being accepted
to Columbia. The probability of being accepted by both is 0:2. What is the probability that
the student is accepted to Columbia, given that the student is accepted at Trump?
A random variable (r.v.) on (
; P) is a real-valued function X:
! R. The notation X  P
declares the r.v. X and associates it with the probability distribution P. (We’ll often leave the
probability space implicit.) The expected value (a.k.a. expectation or mean) of X, written E(X), is
the average value of X under the distribution P:
E(X) :=
X
!2

X(!)  P(!) :
An equivalent de nition of E(X) is E(X) :=
P
x x  P(X = x), where the summation is taken over
all x in the range of X, and P(X = x) is shorthand for P(f! 2
: X(!) = xg).
(c) Consider the sample space
= f1; 2; : : : ; 6g  f1; 2; : : : ; 6g, and let P be the uniform distri-
bution over
, i.e., P(a; b) = 1=36 for each (a; b) 2
. Let X be the random variable de ned
by X(a; b) = minfa; bg for each (a; b) 2
.
For each x 2 f1; 2; : : : ; 6g, what is P(X = x)?
(d) Continuing from (c), what is the expected value of X?
(e) A biased coin with P(heads) = 1=5 is tossed repeatedly until heads comes up. What is the
expected number of tosses?
(f) You create a random sentence of length n by repeatedly picking words at random from the
vocabulary fa; is; not; roseg, with each word being equally likely to be picked. What is the
expected number of times that the phrase \a rose is a rose” will appear in the sentence?
4
Problem 5 (More random stu you should know). We often encounter probability spaces (
; P)
where
is not a discrete set. In this class, the only random variables we’ll consider on such spaces
will either have a discrete image (i.e., fX(!) : ! 2
g is a discrete set) or have a probability density
function p: R ! R, which is a non-negative real-valued function on R such that, for any open
interval (a; b) = fx 2 R : a < x < bg  R,
P(X 2 (a; b)) = P(f! 2
: X(!) 2 (a; b)g) =
Z
(a;b)
p(x) dx :
Random variables with probability density functions will be called continuous random variables.
(a) Let X be a continuous random variable with probability density function p given by
p(x) :=
(
0 if x < 0 ;
e􀀀x if x  0 :
Here,  is a positive number (typically called the rate parameter). If P(X  1000000) = 0:5,
then what is the value of ?
(b) Let X be a standard normal random variable, i.e., a continuous random variable whose density
is the standard normal density p(x) := e􀀀x2=2=
p
2 for all x 2 R. De ne the random variable
Y on the same probability space as X by Y := X2, i.e., Y (!) := X(!)2 for all ! 2
. What
are E(X) and E(Y )?
A collection of continuous random variables X1;X2; : : : ;Xd, all de ned on the same probability
space, has a (joint) probability density function p: Rd ! R if, for any A  Rd,
P((X1;X2; : : : ;Xd) 2 A) =
Z
A
p(x1; x2; : : : ; xd) dx1 dx2    dxd :
We’ll often collect several random variables, such as X1;X2; : : : ;Xd, into a random vector X =
(X1;X2; : : : ;Xd). So the equation above can be written as P(X 2 A) =
R
A p(x) dx.
(c) Suppose the pair of random variables (X1;X2) has probability density function p given by
p(x1; x2) :=
(
c if 0  x1  0:5 and 0  x2  1 ;
0 otherwise :
Here, c is a constant (that does not depend on x1 or x2). What should be the value of c so
that p is a valid probability density function?
(d) Continuing from (c), what is the probability that X2  X1?
(e) Continuing from (c), de ne another random variable Y on the same probability space as X1
and X2 by
Y :=
(
1 if X1 > 2X2 ;
􀀀1 otherwise :
Are X1 and Y independent? What is the expected value of Y ?
(f) Continuing from (c), de ne yet another random variable Z on the same probability space as
X1 and X2 by
Z :=
(
1 if X2 > 1=2 ;
􀀀1 otherwise :
Are X1 and Z independent? What is the expected value of X1Z?
5
Problem 6 (Google Cloud; optional but recommended). Set up a virtual machine on Google
Cloud. Figure out how to install some useful Python packages like numpy, scipy, scikit-learn,
etc. Download the OCR image data set ocr.mat from Courseworks, and load it into memory:
from scipy .io import loadmat
ocr = loadmat (‘ocr.mat ‘)
This le contains four di erent matrices called data, labels, testdata, and testlabels. For
example, data represents a 60000784 matrix, which you can verify using the following command:
ocr[‘data ‘]. shape
Using the numpy and scipy libraries, write some code to compute the average squared Euclidean
norm of the rows of data. The following functions may be useful:
• numpy.apply_along_axis
• numpy.linalg.norm
• numpy.mean
The result should be around 127:642. You don’t need to submit anything for this problem.
6