Description
1. The following gases carbon dioxide (CO2), methane (CH4), nitrous oxide (N2O) and
Ozone (O3) in the atmosphere are implicated in increasing global temperatures, and are
known as greenhouse gases.
The concentration of these gases in the atmosphere and
corresponding global average temperatures obtained from the EPA website
(https://www.epa.gov/climate-indicators/weather-climate) between the years 1984 to
2014 is given in the Excel file ghg-concentrations_1984-2014.xlsx (units for different
variables are also given in Excel sheet).
(a) Develop a multilinear regression model between global temperature (deviations) and
concentrations of greenhouse gases using OLS. Is the global temperature positively
correlated with increase in the concentration of these gases?
(b) Estimate the error variance in temperature measurements and confidence intervals
(CIs) for all regression coefficients. Based on residual analysis, remove samples
suspected of being outliers (one at a time) until there are no outliers.
(c) Improve the regression model obtained in step (b) by dropping unimportant
(insignificant) variables (one at a time).
(d) The effect of different gases on the global temperature is expressed in terms of CO2
equivalents or global warming potential (GWP). Is it possible to make any inference
regarding GWP of the gases from the regression coefficients? Compare the GWP
obtained from regression coefficients to the values obtained over a 20 year time horizon:
CO2 (1), CH4 (86), N2O (289).
Notes: Water vapour, which is present in significant amount is the atmosphere is also a
greenhouse gas, but it remains almost constant and is relatively unaffected by human
activity. CFCs/HCFCs which are also greenhouse gases are however being monitored
only in recent years.
2. Consider the problem of developing a correlation between saturated pressure (Psat) and
saturated temperature T (boiling point). For pure components, the Antoine equation
given below generally fits the data well
𝑙𝑙𝑙𝑙 𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠 = 𝐴𝐴 − 𝐵𝐵/(𝑇𝑇 + 𝐶𝐶)
For n-hexane, the values of the constants are A = 14.0568, B = 2825.42, and C = 230.44
where Psat is given in kPa and T in deg C. Using this correlation a data set consisting of
100 samples have been generated in the temperature range 10 – 70 deg C. Gaussian
measurements errors to both the true temperature and saturated pressures with standard
deviations of 0.18 deg C and 2 kPa, respectively, have been added to generate the
measurements (available in vpdata.mat)
(a) The Classius-Clapeyron equation is a theoretically derived model between Psat and T
and is given by
𝑙𝑙𝑙𝑙 𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠 = 𝐴𝐴′ − 𝐵𝐵′
𝑇𝑇
Assuming that temperature measurements are noise-free and pressure measurements are
noisy, use linear regression to obtain estimates of parameters A’ and B’.
(b) Assuming that temperature measurements are noise-free and pressure measurements
are noisy, use nonlinear regression to obtain estimates of parameters A, B and C.
(c) Assuming both pressures and temperature measurements are noisy apply weighted
total least squares obtain estimates of parameters A, B, and C. Use the inverse of standard
deviation of errors as weights to set up the nonlinear optimization problem.
(d) For the models obtained in (a), (b), and (c) report the maximum error in predicting the
saturated pressures using the identified model for the sample data.
Use MATLAB function lsqnonlin to estimate the nonlinear model parameters in (b) and
(c)
3. A zoologist obtained measurements of the mass (in grams), the snout-vent length
(SVL) and hind limb span (HLS) in mm of 25 lizards. The mean and covariance
matrix of the data about the mean are given by
=
=
34 102 186
21 64 102
7 21 34
129
68
9
x S
(a) The largest eigenvalue of the above covariance matrix is 250.4. Determine the
normalized eigenvector corresponding to this eigenvalue. Also determine the remaining
eigenvalues and corresponding mutually orthogonal eigenvectors.
(b) How many principal components should be retained, if at least 95% of the variance
in the data has to be captured?
(c) Assuming that there are two linear relationships among the three variables, determine
one possible set of these linear relations.
(d) Using the PCA model, determine the scores for a female lizard with the following
measurements: mass = 10.1 gms, SVL = 73mm and HLS = 135.5mm.
(e) Using the PCA model, estimate the mass of a lizard whose measured SVL is 73mm
(f) Using the PCA model, estimate the mass of a lizard whose measured SVL is 73mm
and measured HLS is 135.5 mm.
Note: The first and second problem can be solved using MATLAB, while the third
problems should be solved manually and can be verified using MATLAB. Submit
the MATLAB codes along with your solution