Evaluation metrics for regression

Posted by Sagar A

Error:  predicted – actual values

How far the predicted value from actual value

MAE: average of absolute value of error

  • Error gets positive and negative value. This positive and negative sign indicates direction from the best fitted line.
  • To remove this negative sign and only magnitude will be taken under consideration , taking absolute value of error and then taking average so the negative sign disappears.

MSE : average of squared error (***Detect the presence of outlier***)

  • Second method for removing negative signs, taking square of error values and then it’s average.
  • But this output value  doesn’t look precise error value.

RMSE: To compensate this MSE , take the square root of MSE .(***give the spread of residual and analogous to standard deviation means give the variance of data ***)

  • RMSE-Most popular metric for regression
  • Follow a normal distribution, and also follow an assumption an error is unbiased.
  • Highly affected by outliers, make sure prior using metrics , have to remove outliers from the dataset.
  • Reliable for large dataset
  • As compared to MAE , RMSE gives higher weightage and punishes large errors..


  • here we take a log of predicted and actual values. And measures what changes are done in variance.
  • It is not affected by outliers
  • Main goal is when we don’t want to penalize huge differences in actual and predicted value when both are huge numbers. 
  • Log is used for scales down the largest value.
  1. If both predicted and actual values are small: RMSE and RMSLE are the same.
  2. If either predicted or the actual value is big: RMSE > RMSLE
  3. If both predicted and actual values are big: RMSE > RMSLE (RMSLE becomes almost negligible)

MAPE: Mean absolute percentage error

  • Equivalent to % of MAE
  • Both MAE & MAPE , robust to the effect of outliers
  • actual value more than predicted →  MAPE less  and actual less than predicted →  MAPE is more

MPE: Mean percentage error

  • Same as MAPE. The only difference is that it lacks absolute value.
  • MPE shows the bias of model
  • MPE is useful model , since it allow us to see if our model systematically underestimates(more negative) and overestimates( more positive) 


  • R2 gives information about how much variability of independent variables explained by the model.
  • If MSE or RMSE decreases ,model performance increases. But these values alone are not intuitive.
  • In the case of classification models, accuracy of model is 0.8. by testing this model against a random model ,accuracy is 0.5; this random model is treated as our benchmark.
  • But in RMSE , there is no benchmark model to compare.
  • We can use R2  to compare the model
  • MSE (Model): MSE of the predictions against the actual values
  • MSE(Baseline): MSE of  mean prediction against the actual values

Adjusted R2:

  • When we add more independent variables into the model , R2 always increases but not deceases, since it doesn’t take care of significant or correlated  variables . 
  • Adjusted R2 alway taking care of significant / correlated variables. If added features are correlated with the output, then adjusted R2 value will increase. Otherwise features are not significant , adj R2 value slightly decreases.

Evaluation metrics for Classification:

In classification problem , Two types of algorithms depending on outputs:

  1. Class output : Algorithms like SVM and KNN create a class output. For binary classification , output will be 0 or 1 .
  2. Probability output :  Algorithms like Logistic regression , Random Forest , Gradient boosting, adaboost etc. give probability outputs. Converting probability outputs to class output is just a matter of creating a threshold probability.

For balanced dataset , Accuracy  used  as an evaluation matrix.( Balanced dataset – no bias in dataset).

Recall/ precision : used in an imbalanced dataset.

Confusion Matrix: 

TP =  When classifier predicted TRUE  (They have disease) , and correct class was TRUE (Patient has disease) 

TN = When model predicted FALSE (No disease), and correct class was FALSE (When patient do not have disease)

FP (Type I Error) = Classifier predicted TRUE , but correct class was FALSE (Patient did not have disease). EX. Producer Risk

FN (Type II Error ) = Classifier predicted FALSE (Patients do not have disease) , but the correct class was TRUE (but they actually do have disease). EX. Consumer Risk

Accuracy  : ( TP + TN)/ ( TP+ TN + FN +FP)

Misclassification rate(Error rate) : (FP+ FN)/(TP+ TN +FN+FP)

Precision  : TP/TP+FP 

  • Out of total predicted positive value , how many result were actually positive
  • In precision , we are trying to reduce FP error but  FN is higher

Recall :  TP/TP+FN  

  • Out of total actual positive value , how many did we predict positively.
  • In recall, trying to reduce FN , but  FP is higher

F Beta :

  • Beta value may vary , it can be 1, 0.5 or 2.
  • F beta-Score is the harmonic mean of precision and recall values for a classification problem. 

Harmonic mean =  2xy/x+y

  • B=1 …. Whenever FP and FN are equally important on model
  • B= 0.5 …. FP having greater impact than FN then it is precision , B should decreases ….. choose B in between o and 1 (mostly choose 0.5)
  • B>1 …. FN having greater impact than FP then it is Recall , B value should increase


Log loss:

Gini Coefficients:

Bias, Underfitting , Overfitting


  • How far are the predicted values from actual i.e.  if avg predicted values are far off from the actual value then bias is high  → that means data is underfit 
  • No relation between input and output


  • If a model performs good on training data but does not on other data or test data.


Regularization is used to create the best fit line in regression and reducing the overfitting problem.


  • It is used to reduce overfitting by adding some bias Or  by penalizing the stepper slope of best fit line
  • In ridge , features are up to zero not exactly zero ,because of slope square.

SSE (Cost function) + penalty * (slope^2)

Where, (slope^2) = penalty term

Lasso:  is used to reduce overfitting and feature selection also.

  • Because of the magnitude of the slope , some features are reduced to zero. 

SSE (Cost function) + penalty * | slope |

Where, | slope | = magnitude of slope

Leave A Comment