Description
1. Use a hard-margin SVM
to classify cars as having automatic or manual transmissions.
- Read http://www.stat.wisc.edu/~jgillett/451/01/mtcars30.csv into a DataFrame. (This is the mtcars data frame from R with two of its rows removed to get linearly separable data.)
- Make a 30×2 numpy array X from the mpg (miles per gallon) and wt (weight in 1000s of pounds) columns. Make an array y from the am column (where 0=automatic or 1=manual transmission).
- Train an SVM using kernel=’linear’ and C=1000. Print its coefficients and intercept.
- Report the training accuracy. (It’s given by clf.score(X, y).)
- Predict the transmission for a car weighing 4000 pounds (wt=4) that gets 20 mpg.
- Use five plt.plot() calls to make a figure with wt on its x-axis and mpg on its y-axis including:
- the automatic transmission cars in red
- the manual transmission cars in blue
- the decision boundary (the center line of the road)
- the lower margin boundary (the left side of the road)
- the upper margin boundary (the right side of the road)
- a reasonable title, axis labels, and legend
In [2]:
# ... your code here ...
The decision boundary is -8.24 * weight + -0.309 * mileage + 32.0 = 0. The training accuracy is 1.0. We predict that a car weighing 4 thousand pounds that gets 20 mpg has transmission type 0 (where 0=automatic, 1=manual).
2. Make three linear regression models.
2a: Make a simple regression model by hand.
Use the matrix formula 𝑤=(𝑋𝑇𝑋)−1𝑋𝑇𝑦 we developed in class to fit these three points: (0, 5), (2, 1), (4, 3). (Use linear_model.linearRegression(), if you wish, to check your work.)
… your answer here (just give the model, 𝑦=𝑤𝑥+𝑏) …
intercept=4.0, slope=-0.5
Out[4]:
[<matplotlib.lines.Line2D at 0x7f4b2b8d79a0>]
2b: Make a simple linear regression model from real data.
Estimate the average daily trading volume of a Dow Jones Industrial Average stock from its market capitalization. That is, use 𝑦= AvgVol vs. 𝑥= MarketCap.
- Read http://www.stat.wisc.edu/~jgillett/451/data/DJIA.csv into a DataFrame.
- Find the model. Print its equation.
- Print its 𝑅2 value (the proportion of variability in 𝑦 accounted for by 𝑥 via the linear model, given by model.score(X, y)).
- Use the model to predict the volume for a company with market capitalization of 0.25e12 (a quarter-trillion dollars); add this as a red point on your plot.
- Say what happens to Volume as Market Capitalization increases. (Use a Markdown cell.)
In [5]:
# ... your code here ...
The model is Volume = 2.68e-05 * (Market Capitalization) + 3.41e+06. R^2 is 0.705. We predict a Volume of 1.01e+07 for a company with Market Capitalization 2.5e+11 (see red dot).
2c. Make a multiple regression model.
Estimate the same volume from both market capitalization and price. That is, use 𝑦= AvgVol vs. 𝑥1= MarketCap and 𝑥2= Price.
- Find the model.
- Print its equation.
- Print its 𝑅2 value.
- Say what happens to Volume as Market Capitalization increases and what happens to Volume as Price increases. (Use a Markdown cell.)
In [7]:
# ... your code here ...
The model is Volume = 2.89e-05 * (Market Capitalization) + -6.69e+04 * Price + 1.44e+07. R^2 is 0.823.