Mean Absolute Error (MAE) |
\[ \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| \] |
Average of absolute differences between actual and predicted. |
Less sensitive to outliers |
It does not consider the direction of the error and does not emphasize larger errors compared to MSE and RMSE. |
Suppose our model has an MAE of 20,000 USD. This means that on average, our predictions on the price of houses are off by 20,000 USD. So, if the model predicts a house to be 500,000 USD, we can expect the actual price to be anywhere between 480,000 USD and 520,000 USD. |
Minimize |
Mean Squared Error (MSE) |
\[ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \] |
Average of squared differences between actual and predicted. |
Emphasizes larger errors due to squaring. |
Can be sensitive to outliers because it squares the prediction errors. |
If the MSE of our model is 1,000,000,000 (USD²), this tells us that the average squared difference between the predicted and actual house prices is 1,000,000,000. However, interpreting MSE in its raw form can be quite difficult due to the square units (USD²), so it's often more helpful to interpret the square root of the MSE (RMSE) instead. |
Minimize |
Root Mean Squared Error (RMSE) |
\[ \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} \] |
Square root of MSE. |
Easier to interpret than MSE because RMSE is in the same unit as the target variable. |
Like MSE, RMSE increases the weight of the bigger errors due to squaring. |
Now, if the RMSE of our model is 31,623 USD (which is the square root of 1,000,000,000), this indicates that the standard deviation of our prediction errors is roughly 31,623 USD. Essentially, this tells us that our predictions are scattered on average by 31,623 USD from the actual house price. |
Minimize |
Mean Absolute Percentage Error (MAPE) |
\[ \frac{100\%}{n} \sum_{i=1}^{n} \left|\frac{y_i - \hat{y}_i}{y_i}\right| \] |
Average absolute percent difference between observed and predicted values |
Useful when dealing with variables of varying scales |
Can lead to divide by zero errors, not suited for values close to zero |
If a model predicting house prices has a MAPE of 15%, the model's predictions are off by 15% of the actual price on average |
Minimize |
R-Squared (R²) |
\[ 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2} \] |
Proportion of the variance in the dependent variable predictable from the independent variable(s) |
It can be interpreted as a percentage |
Does not inform about the absolute fit of the model, but relative fit to a simple mean model |
If a model predicting house prices has an R² of 0.85, the model explains 85% of the variability in house prices from the features |
Maximize |
Adjusted R-Squared |
\[ 1 - (1 - R^2)\frac{n-1}{n-p-1} \] |
Like R² but adjusted for the number of predictors in the model |
Takes into account the number of predictors |
More complex than simple R² |
If a model predicting house prices with 3 features has an Adjusted R² of 0.82, after adjusting for the number of predictors, the model explains 82% of the variability in house prices |
Maximize |