## Description

## 1. Medical image estimation.

Suppose xi

, i = 1, . . . , n are i.i.d. Poisson with

P(xi = k) = e

−µiµ

k

i

k!

with unknown mean µi

.

The variables xi represent the number of times that one of n

possible independent events occurs during a certain period. In emission tomography, they

may represent the number of photons emitted by n sources.

We consider an experiment designed to determine the means µi

. The experiment involves

m detectors. If event i occurs, it is detected by detector j with probability pji. We assume

the probabilities pji are given (with pji > 0 and Pm

j=1 pji ≤ 1. The total number of events

recorded by detector j is denoted by yj ,

yj =

Xn

i=1

yji, j = 1, . . . , m.

Formulate the maximum likelihood estimation problem of estimating the means µi

, based

on observed values of yj , j = 1, . . . , m. Will the maximum likelihood function returns a

unique maximizer? (Hint: the variables yji have Poisson distribution with means pjiµi

.

The sum of n independent Poisson variables with means λ1, . . . , λn has a Poisson distribution with mean λ1 + · · · + λn.

## 2. Logistic regression.

Given n observations (xi

, yi), i = 1, . . . , n, xi ∈ R

p

, yi ∈ {0, 1}, parameters a ∈ R

p and

b ∈ R. Consider the log-likelihood function for logistic regression:

`(a, b) = Xn

i=1

{yi

log h(xi

; a, b) + (1 − yi) log(1 − h(xi

; a, b))}

(a) Derive the Hessian H of this function and show that H is negative semi-definite (this

implies that ` is concave and has no local maxima other than the global one.)

(b) Use data logit-x.dat and logit-y.dat, which contain the predictors xi ∈ R

2 and response

yi ∈ {0, 1} respectively for logistic regression problem. Implement Newton’s method

for optimizing `(a, b) and apply it to fit a logistic regression model to the data.

Initialize Newton’s method with a = 0, b = 0. Plot the value of the log likelihood

function versus iterations. What are the coefficients a and b from your fit?

(c) Find a value of step-size that gives you convergence, and another value of step-size

(larger) where your algorithm diverges.

## 3. Locally weighted linear regression.

Consider a linear regression problem in which we want to weight different training examples

differently. Specifically, suppose we want to minimize

J(θ) = 1

2

Xn

i=1

wi(θ

T xi − yi)

2

.

In class, we have worked out what happens for the case where all the weights are the same.

In this problem, we will generalize some of those ideas to the weighted setting, and also

implement the locally weighted linear regression algorithm.

(a) Show that J(θ) can also be written as

J(θ) = (Xθ − y)

TW(Xθ − y)

for an appropriate diagonal matrix W, matrix X and vector y. State clearly what

these matrices and vectors are.

(b) Suppose we have samples (xi

, yi), i = 1, . . . , n of n independent examples, but in

which the yi

’s were observed with different variances, and

p(yi

|xi

, θ) = 1

q

2πσ2

i

exp(−

(yi − θ

T xi)

2

2σ

2

i

)

i.e. yi has mean θ

T xi and variance σ

2

i

(where σ

2

i

are fixed, known, constants).

Show

that finding the maximum likelihood estimate of θ reduces to solving a weighted linear

regression problem. State clearly what the wis are in terms of σ

2

i

’s.

(c) Use data rx.dat and ry.dat, which contain the predictors xi and response yi respectively

for our problem. Implement gradient descent for (unweighted) linear regression that

we derived in class on this dataset, and plot on the same figure the data and the

straight line resulting from your fit. (Remember to include the intercept term.)

(d) Implement locally weighted linear regression on this dataset, using gradient descent,

and plot on the same figure the data and the line resulting from your fit. Using the

following weights

wi = exp(−x

2

i /(20)).

Plot the J(θ) versus iterations.

4. Exponential family and Fisher information. A PDF f(x|θ) of a random variable is

called to be from an exponential family if we can write

f(x|θ) = g(x)e

β(θ)+h(x)>γ(θ)

for some g(x), β(θ), h(x) and γ(θ).

(a) Show that Bernoulli, Binomial, Poisson, Exponential and Gaussian distributions all

belong to exponential family. Here the PDF for them are given by

Bernoulli: f(x|p) = p

x

(1 − p)

1−x

, x = {0, 1}

Binomial: f(x|n, p) =

n

x

p

x

(1 − p)

n−x

, x = {0, 1, . . . , n}

Poisson: f(x|λ) = e

−λλ

x

/x!, x = {0, 1, . . .}

Exponential: f(x|λ) = e

−λxλ, x ≥ 0

Gaussian: f(x|µ, Σ) = 1

p

(2π)

p|Σ|

e

− 1

2

(x−µ)>Σ−1

(x−µ)

, x ∈ R

p

(b) Find the Fisher information for Bernoulli distribution.

## 5. House price dataset.

The HOUSES dataset contains a collection of recent real estate listings in San Luis Obispo

county and around it. The dataset is provided in RealEstate.csv.

The dataset contains the following fields:

• MLS: Multiple listing service number for the house (unique ID).

• Location: city/town where the house is located. Most locations are in San Luis

Obispo county and northern Santa Barbara county (Santa Maria-Orcutt, Lompoc,

Guadelupe, Los Alamos), but there some out of area locations as well.

• Price: the most recent listing price of the house (in dollars).

• Bedrooms: number of bedrooms.

• Bathrooms: number of bathrooms.

• Size: size of the house in square feet.

• Price/SQ.ft: price of the house per square foot.

• Status: type of sale. Thee types are represented in the dataset: Short Sale, Foreclosure and Regular.

Fit linear regression model to predict Price using remaining factors (except Status), for

each of the three types of sales: Short Sale, Foreclosure and Regular, respectively.