MACHINE LEARNING

Evaluation metrics for regression

Posted by Sagar A

Error:  predicted – actual values

How far the predicted value from actual value

MAE: average of absolute value of error

  • Error gets positive and negative value. This positive and negative sign indicates direction from the best fitted line.
  • To remove this negative sign and only magnitude will be taken under consideration , taking absolute value of error and then taking average so the negative sign disappears.

MSE : average of squared error (***Detect the presence of outlier***)

  • Second method for removing negative signs, taking square of error values and then it’s average.
  • But this output value  doesn’t look precise error value.

RMSE: To compensate this MSE , take the square root of MSE .(***give the spread of residual and analogous to standard deviation means give the variance of data ***)

  • RMSE-Most popular metric for regression
  • Follow a normal distribution, and also follow an assumption an error is unbiased.
  • Highly affected by outliers, make sure prior using metrics , have to remove outliers from the dataset.
  • Reliable for large dataset
  • As compared to MAE , RMSE gives higher weightage and punishes large errors..

RMSLE

  • here we take a log of predicted and actual values. And measures what changes are done in variance.
  • It is not affected by outliers
  • Main goal is when we don’t want to penalize huge differences in actual and predicted value when both are huge numbers. 
  • Log is used for scales down the largest value.
  1. If both predicted and actual values are small: RMSE and RMSLE are the same.
  2. If either predicted or the actual value is big: RMSE > RMSLE
  3. If both predicted and actual values are big: RMSE > RMSLE (RMSLE becomes almost negligible)

MAPE: Mean absolute percentage error

  • Equivalent to % of MAE
  • Both MAE & MAPE , robust to the effect of outliers
  • actual value more than predicted →  MAPE less  and actual less than predicted →  MAPE is more

MPE: Mean percentage error

  • Same as MAPE. The only difference is that it lacks absolute value.
  • MPE shows the bias of model
  • MPE is useful model , since it allow us to see if our model systematically underestimates(more negative) and overestimates( more positive) 

R2: 

  • R2 gives information about how much variability of independent variables explained by the model.
  • If MSE or RMSE decreases ,model performance increases. But these values alone are not intuitive.
  • In the case of classification models, accuracy of model is 0.8. by testing this model against a random model ,accuracy is 0.5; this random model is treated as our benchmark.
  • But in RMSE , there is no benchmark model to compare.
  • We can use R2  to compare the model
  • MSE (Model): MSE of the predictions against the actual values
  • MSE(Baseline): MSE of  mean prediction against the actual values

Adjusted R2:

  • When we add more independent variables into the model , R2 always increases but not deceases, since it doesn’t take care of significant or correlated  variables . 
  • Adjusted R2 alway taking care of significant / correlated variables. If added features are correlated with the output, then adjusted R2 value will increase. Otherwise features are not significant , adj R2 value slightly decreases.

Evaluation metrics for Classification:

In classification problem , Two types of algorithms depending on outputs:

  1. Class output : Algorithms like SVM and KNN create a class output. For binary classification , output will be 0 or 1 .
  2. Probability output :  Algorithms like Logistic regression , Random Forest , Gradient boosting, adaboost etc. give probability outputs. Converting probability outputs to class output is just a matter of creating a threshold probability.

For balanced dataset , Accuracy  used  as an evaluation matrix.( Balanced dataset – no bias in dataset).

Recall/ precision : used in an imbalanced dataset.

Confusion Matrix: 

TP =  When classifier predicted TRUE  (They have disease) , and correct class was TRUE (Patient has disease) 

TN = When model predicted FALSE (No disease), and correct class was FALSE (When patient do not have disease)

FP (Type I Error) = Classifier predicted TRUE , but correct class was FALSE (Patient did not have disease). EX. Producer Risk

FN (Type II Error ) = Classifier predicted FALSE (Patients do not have disease) , but the correct class was TRUE (but they actually do have disease). EX. Consumer Risk

Accuracy  : ( TP + TN)/ ( TP+ TN + FN +FP)

Misclassification rate(Error rate) : (FP+ FN)/(TP+ TN +FN+FP)

Precision  : TP/TP+FP 

  • Out of total predicted positive value , how many result were actually positive
  • In precision , we are trying to reduce FP error but  FN is higher

Recall :  TP/TP+FN  

  • Out of total actual positive value , how many did we predict positively.
  • In recall, trying to reduce FN , but  FP is higher

F Beta :

  • Beta value may vary , it can be 1, 0.5 or 2.
  • F beta-Score is the harmonic mean of precision and recall values for a classification problem. 

Harmonic mean =  2xy/x+y

  • B=1 …. Whenever FP and FN are equally important on model
  • B= 0.5 …. FP having greater impact than FN then it is precision , B should decreases ….. choose B in between o and 1 (mostly choose 0.5)
  • B>1 …. FN having greater impact than FP then it is Recall , B value should increase

AUC-ROC:

Log loss:

Gini Coefficients:

Bias, Underfitting , Overfitting

Bias: 

  • How far are the predicted values from actual i.e.  if avg predicted values are far off from the actual value then bias is high  → that means data is underfit 
  • No relation between input and output

Variance:

  • If a model performs good on training data but does not on other data or test data.

Regularization: 

Regularization is used to create the best fit line in regression and reducing the overfitting problem.

Ridge

  • It is used to reduce overfitting by adding some bias Or  by penalizing the stepper slope of best fit line
  • In ridge , features are up to zero not exactly zero ,because of slope square.

SSE (Cost function) + penalty * (slope^2)

Where, (slope^2) = penalty term

Lasso:  is used to reduce overfitting and feature selection also.

  • Because of the magnitude of the slope , some features are reduced to zero. 

SSE (Cost function) + penalty * | slope |

Where, | slope | = magnitude of slope

Leave A Comment