Introduction:
This post is focused on basic concepts in linear regression and I will share how to calculate baseline prediction, SSE, SST, R2 and RMSE for a single variable linear regression.
Dataset:
The following figure shows three data points and the best-fit regression line: y = 3x + 2.
The x-coordinate, or “x”, is our independent variable and the y-coordinate, or “y”, is our dependent variable.
Baseline prediction is just the average of values of dependent variables. So in this case:
(2 + 2 + 8) / 3 = 4
It won’t take into account the independent variables and just predict the same outcome. We’ll see in a minute why baseline prediction is important.
Here’s what the baseline model would look like:
SSE:
SSE stands for Sum of Squared errors.
Error is the difference between actual and predicted values.
So SSE in this case:
= (2 – 2)^2 + (2 – 5)^2 + (8 – 5)^2
= 0 + 9 + 9
= 18
SST:
SST stands for Total Sum of Squares.
Step 1 is to take the difference between Actual values and Baseline values of the dependent variables.
Step 2 is to Square them each and add them up.
So in this case:
= (2 – 4)^2 + (2 – 4)^2 + (8 – 4)^2
= 24
R2:
Now R2 is 1 – (SSE/SST)
So in this case:
= 1 – (18/24)
= 0.25
RMSE:
RMSE is Root mean squared error. It can be computed using:
Square Root of (SSE/N) where N is the # of dependent variables.
So in this case, it’s:
SQRT (18/3) = 2.44