Error:  predicted – actual values

How far the predicted value from actual value

MAE: average of absolute value of error

• Error gets positive and negative value. This positive and negative sign indicates direction from the best fitted line.
• To remove this negative sign and only magnitude will be taken under consideration , taking absolute value of error and then taking average so the negative sign disappears.

MSE : average of squared error (***Detect the presence of outlier***)

• Second method for removing negative signs, taking square of error values and then it’s average.
• But this output value  doesn’t look precise error value.

RMSE: To compensate this MSE , take the square root of MSE .(***give the spread of residual and analogous to standard deviation means give the variance of data ***)

• RMSE-Most popular metric for regression
• Follow a normal distribution, and also follow an assumption an error is unbiased.
• Highly affected by outliers, make sure prior using metrics , have to remove outliers from the dataset.
• Reliable for large dataset
• As compared to MAE , RMSE gives higher weightage and punishes large errors..

RMSLE

• here we take a log of predicted and actual values. And measures what changes are done in variance.
• It is not affected by outliers
• Main goal is when we don’t want to penalize huge differences in actual and predicted value when both are huge numbers.
• Log is used for scales down the largest value.
1. If both predicted and actual values are small: RMSE and RMSLE are the same.
2. If either predicted or the actual value is big: RMSE > RMSLE
3. If both predicted and actual values are big: RMSE > RMSLE (RMSLE becomes almost negligible)

MAPE: Mean absolute percentage error

• Equivalent to % of MAE
• Both MAE & MAPE , robust to the effect of outliers
• actual value more than predicted →  MAPE less  and actual less than predicted →  MAPE is more

MPE: Mean percentage error

• Same as MAPE. The only difference is that it lacks absolute value.
• MPE shows the bias of model
• MPE is useful model , since it allow us to see if our model systematically underestimates(more negative) and overestimates( more positive)

R2:

• R2 gives information about how much variability of independent variables explained by the model.
• If MSE or RMSE decreases ,model performance increases. But these values alone are not intuitive.
• In the case of classification models, accuracy of model is 0.8. by testing this model against a random model ,accuracy is 0.5; this random model is treated as our benchmark.
• But in RMSE , there is no benchmark model to compare.
• We can use R2  to compare the model
• MSE (Model): MSE of the predictions against the actual values
• MSE(Baseline): MSE of  mean prediction against the actual values

• When we add more independent variables into the model , R2 always increases but not deceases, since it doesn’t take care of significant or correlated  variables .
• Adjusted R2 alway taking care of significant / correlated variables. If added features are correlated with the output, then adjusted R2 value will increase. Otherwise features are not significant , adj R2 value slightly decreases.

Evaluation metrics for Classification:

In classification problem , Two types of algorithms depending on outputs:

1. Class output : Algorithms like SVM and KNN create a class output. For binary classification , output will be 0 or 1 .
2. Probability output :  Algorithms like Logistic regression , Random Forest , Gradient boosting, adaboost etc. give probability outputs. Converting probability outputs to class output is just a matter of creating a threshold probability.

For balanced dataset , Accuracy  used  as an evaluation matrix.( Balanced dataset – no bias in dataset).

Recall/ precision : used in an imbalanced dataset.

Confusion Matrix:

TP =  When classifier predicted TRUE  (They have disease) , and correct class was TRUE (Patient has disease)

TN = When model predicted FALSE (No disease), and correct class was FALSE (When patient do not have disease)

FP (Type I Error) = Classifier predicted TRUE , but correct class was FALSE (Patient did not have disease). EX. Producer Risk

FN (Type II Error ) = Classifier predicted FALSE (Patients do not have disease) , but the correct class was TRUE (but they actually do have disease). EX. Consumer Risk

Accuracy  : ( TP + TN)/ ( TP+ TN + FN +FP)

Misclassification rate(Error rate) : (FP+ FN)/(TP+ TN +FN+FP)

Precision  : TP/TP+FP

• Out of total predicted positive value , how many result were actually positive
• In precision , we are trying to reduce FP error but  FN is higher

Recall :  TP/TP+FN

• Out of total actual positive value , how many did we predict positively.
• In recall, trying to reduce FN , but  FP is higher

F Beta :

• Beta value may vary , it can be 1, 0.5 or 2.
• F beta-Score is the harmonic mean of precision and recall values for a classification problem.

Harmonic mean =  2xy/x+y

• B=1 …. Whenever FP and FN are equally important on model
• B= 0.5 …. FP having greater impact than FN then it is precision , B should decreases ….. choose B in between o and 1 (mostly choose 0.5)
• B>1 …. FN having greater impact than FP then it is Recall , B value should increase

AUC-ROC:

Log loss:

Gini Coefficients:

Bias, Underfitting , Overfitting

Bias:

• How far are the predicted values from actual i.e.  if avg predicted values are far off from the actual value then bias is high  → that means data is underfit
• No relation between input and output

Variance:

• If a model performs good on training data but does not on other data or test data.

Regularization:

Regularization is used to create the best fit line in regression and reducing the overfitting problem.

Ridge

• It is used to reduce overfitting by adding some bias Or  by penalizing the stepper slope of best fit line
• In ridge , features are up to zero not exactly zero ,because of slope square.

SSE (Cost function) + penalty * (slope^2)

Where, (slope^2) = penalty term

Lasso:  is used to reduce overfitting and feature selection also.

• Because of the magnitude of the slope , some features are reduced to zero.

SSE (Cost function) + penalty * | slope |

Where, | slope | = magnitude of slope