Statistical Learning Homework-01 solution

$29.99

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (7 votes)

General Instructions
• You can use any programming language you want, as long as your work is runnable/correct/readable. Two examples:
– In R: it would be nice to upload a well-edited and working R Markdown file (.rmd) + its html output.
– In Python: it would be nice to upload a well-edited and working Jupyter notebook (or similia).
In case of R
If you go for R, to be sure that everything is working, start RStudio and create an empty project called HW1. Now open a new
R Markdown file (File > New File > R Markdown…); set the output to HTML mode, press OK and then click on Knit HTML.
This should produce a html. You can now start editing this file to produce your homework submission.
• For more info on R Markdown, check the support webpage: R Markdown from RStudio.
• For more info on how to write math formulas in LaTex: Wikibooks.
The Data
For this first homework you’ll be working on the WMAP data described at page 45 (Example 4.4) of our purple book. We are
talking about a snapshot of our universe in its infancy, something like 379,000 years after the Big Bang, nothing compared to
the estimated age of our universe1
.
The map above was taken by the Wilkinson Microwave Anisotropy Probe (WMAP) and shows differences across the sky in
the temperature of the cosmic microwave background (CMB), the radiant heat remaining from the Big Bang. The average
temperature is 2.73 degrees above absolute zero but the temperature is not constant across the sky. The fluctuations in the
temperature map provide information about the early universe. Indeed, as the universe expanded, there was a tug of war
between the force of expansion and contraction due to gravity. This caused acoustic waves in the hot gas, which is why there
are temperature fluctuations.
The strength of the temperature fluctuations f(x) at each frequency (or multipole) x is called the power spectrum and
this power spectrum can be used by cosmologists to answer cosmological questions. For example, the relative abundance
of different constituents of the universe (such as baryons and dark matter) corresponds to peaks in the power spectrum.
Through a complicated procedure (not described here), the temperature map can be reduced to the following scatterplot of
power versus frequency:
# Load the data
wmap <- read.csv(“~/wmap.csv”)
# Plot
1
…something like 14 billion years!
1
lay.mat = matrix(c(1,1,2,3), nrow = 2, byrow = T)
layout(lay.mat)
# All
plot(wmap$x, wmap$y, pch = 21, bg = “yellow”, cex = .7,
main = “CMB data (WMAP)”, xlab = “Multipole”, ylab = “Power”)
polygon(c(0,400,400,0),
c(min(wmap$y), min(wmap$y), max(wmap$y), max(wmap$y)),
border = NA, col = rgb(0,0,0,.3))
# First bump
plot(wmap$x[1:400], wmap$y[1:400], pch = 21, bg = “green”, cex = .7,
main = “Main bump”, xlab = “Multipole”, ylab = “Power”)
# Secondary bump(s)
plot(wmap$x[-(1:400)], wmap$y[-(1:400)], pch = 21, bg = “red”, cex = .7,
main = “Secondary bump(s) (?)”, xlab = “Multipole”, ylab = “Power”)
0 200 400 600 800
−10000
CMB data (WMAP)
Multipole
Power
0 100 200 300 400
0 3000
Main bump
Multipole
Power
400 500 600 700 800 900
−10000
Secondary bump(s) (?)
Multipole
Power
The horizontal axis is the multipole moment, essentially the frequency of fluctuations in the temperature field of the CMB.
The vertical axis is the power or strength of the fluctuations at each frequency. The top plot shows the full data set. The
two bottom plots zoom in the first 400 and the next 500 data points. The 1
st peak, around x ≈ 200, is obvious. But many
“official” fit (see Fig. 2 in that page) show also a 2
nd and 3
rd peak further to the right. . .
. . . are they really there? Does the data support (without further a priori info) these secondary features?
2
Part I: Polynomials are good. . .
. . . but smooth piecewise polynomials (a.k.a. splines) are better!
In our initial supervised toy example we tried to recover from samples a non-linear (in the original 1-dimensional
feature-space) function using a linear model in a higher-dimensional transformed feature-space (polynomials of degree d)
Now, polynomials are fine, but they are global objects (they span the entire real-line in principle) and may not be able to
capture local structures of the target function.
recipe for a solution: 1. take a polynomial, 2. break it down is small (almost) free-to-move chunks, 3. shake a bit, 4. glue
them back together adding some regularity constraint (continuity, differentiability, etc) as needed. . . a spline is born. . .
more formally: any d
th-order spline f(·) is a piecewise polynomial function of degree d that is continuous and has
continuous derivatives of orders {1, . . . , d − 1} at the so called knot points. Specifically, how do we build a generic d
th-order
spline f(·)? We start from a bunch of points, say q, that we call knots ξ1 < · · · < ξq, and we then ask that. . .
1. . . . f(·) is some polynomial of degree d on each of the intervals: (−∞, ξ1], [ξ1, ξ2], [ξ2, ξ3], . . . , [ξq, +∞);
2. . . . its j
th derivative f
(j)
(·) is continuous at {ξ1, . . . , ξq} for each j ∈ {0, 1, . . . , d − 1}.
The figure below from Chapter 5 of ELS illustrates the effects of enforcing continuity at the knots, across various orders of the
derivative, for a cubic piecewise polynomial.
Splines have some amazing properties, and they have been a topic of interest among statisticians and mathematicians for a very
long time (classic vs recent). But, given a set of points ξ1 < ξ2 < · · · < ξq, is there a quick-and-dirty way to describe/generate
the whole set of d
th-order spline functions over those q knots? The easiest one (not the best!), is to start from truncated
power functions Gd,q = {g1(x), . . . gd+1(x), g(d+1)+1(x), . . . , g(d+1)+q(x)}, defined as

g1(x) = 1, g2(x) = x, . . . , gd+1(x) = x
d

and 
g(d+1)+j (x) = (x − ξj )
d
+
q
j=1 where (x)+ = max{0, x}.
Then, if f(·) is a d
th-order spline with knots {ξ1, . . . , ξq} you can show it can be obtained as a linear combinations over Gd,q
f(x) =
(d+1)+
X q
j=1
βj · gj (x), for some set of coefficients β =

β1, . . . , βd+1, β(d+1)+1, . . . β(d+1)+q
T
.
idea: let’s perform regression on splines instead of polynomials! In other words, as in our initial toy example, given inputs
x = {x1, . . . , xn} and responses y = {y1, . . . , yn}, consider fitting functions f(·) that are d
th-order splines with knots at some
chosen locations, typically the first q quantiles of x this method is dubbed regression splines and it is different from
another famous technique called smoothing splines (we will talk about it later as an example of (Mercer) kernel method).
remark: here you have many tuning parameters (the degree d and the number+position of knots) to be selected predictively
via Cp or some flavor of cross-validation (sample-splitting, k-fold CV, LOOCV, GCV). Although there is a large literature on knot
selection for regression splines via greedy methods like recursive partitioning, here we will go for an easy-to-implement option.
3