# Stat 443 Assignment 1: Exploratory Data Analysis solution

\$30.00

Original Work ?

## Description

5/5 - (1 vote)

1. Rimouski is a city in Quebec, located in the Bas-Saint-Laurent region. Please go to the climate
monthly temperature series for Rimouski (with station name “Rimouski” and climate ID
7056480) from January 1954 to December 2016.

(a) Read in the data and create a time-series object for the monthly mean max temperature.
Plot the series against time (label your graph) and comment on any features of the data
that you observe.

In particular, address the following points:
• Does the series have a trend?
• Is the series stationary?
• Is there a seasonal variation? If so, what is the period of the seasonal effect?
• Would an additive or multiplicative model be more suitable to decompose the series?

(b) Are there any missing values in the series? If so, identify the year and month of all the
missing values. One commonly used imputation method is the LVCF (last value carried
forward), which imputes the missing data with the last observed value. Is it appropriate
to use the LVCF imputation method here? Justify your answer. If not, suggest an
adapted version of the LVCF method that would be appropriate to use here. Apply the
imputation to obtain a complete series.

(c) Create training and test datasets. The training dataset should include all observations
from the year 1954 to 2015. Use the observations of the year 2016 as the test set. You
can use the command window() on a ts object to split the data.

Using an additive model, decompose the (imputed) series into trend, seasonal, and error
components. Use both moving average smoothing (R function decompose()) and the
loess method (R function stl()). Plot both decompositions.

(d) Fit a linear model to the trend component (R function lm()) of the moving average
decomposition. Does the linear model provide evidence of a trend at the 95% confidence
level? Without doing any further analysis, would you use this trend component to
make predictions? Justify your answer using the linear model results and/or the trend
component plot.

(e) Predict the monthly mean max temperatures for the test dataset using the moving average decomposition. Compare your predictions under the following assumptions:
• There is a linear trend;
• There is no trend.

Compare your predictions with the test dataset graphically and compute the sample
mean squared prediction error (MSPE), defined as the average of the squared distances
between the predictions and observations in the test dataset. Which model has a smaller
MSPE?

(f) In time series analysis, it is often assumed that the error term follows a Gaussian white
noise process, i.e., Zt
i.i.d ∼ N (0, σ2
). Generate the correlogram and Q-Q plot of the error
component from the moving average decomposition, and discuss whether the Gaussian
white noise assumption is appropriate.

2. The file GSPC.csv contains 2527 daily closing prices of the Global S&P500 index spanning a
decade from January 2, 1985 until December 29, 1994.

(a) Create a time series object for the adjusted daily closing price of the S&P500 index and
plot it over time. Comment on the features, including stationarity, trend and seasonality.

(b) Transform the original series to daily log-returns by taking the logarithmic difference,
i.e., Xt = ln(St/St−1), where St denotes the adjusted daily closing price at time t. Plot
the daily log-returns over time and comment on the stationarity of the series. Why is it
more convenient to work with the daily log-return series?

(c) Generate the correlogram for both the daily log-return series {Xt} and the absolute value
of daily log-returns {|Xt
|}. Compare and comment on what you observe.