Error: predicted – actual values
How far the predicted value from actual value
MAE: average of absolute value of error
- Error gets positive and negative value. This positive and negative sign indicates direction from the best fitted line.
- To remove this negative sign and only magnitude will be taken under consideration , taking absolute value of error and then taking average so the negative sign disappears.
MSE : average of squared error (***Detect the presence of outlier***)
- Second method for removing negative signs, taking square of error values and then it’s average.
- But this output value doesn’t look precise error value.
RMSE: To compensate this MSE , take the square root of MSE .(***give the spread of residual and analogous to standard deviation means give the variance of data ***)
- RMSE-Most popular metric for regression
- Follow a normal distribution, and also follow an assumption an error is unbiased.
- Highly affected by outliers, make sure prior using metrics , have to remove outliers from the dataset.
- Reliable for large dataset
- As compared to MAE , RMSE gives higher weightage and punishes large errors..
RMSLE:
- here we take a log of predicted and actual values. And measures what changes are done in variance.
- It is not affected by outliers
- Main goal is when we don’t want to penalize huge differences in actual and predicted value when both are huge numbers.
- Log is used for scales down the largest value.
- If both predicted and actual values are small: RMSE and RMSLE are the same.
- If either predicted or the actual value is big: RMSE > RMSLE
- If both predicted and actual values are big: RMSE > RMSLE (RMSLE becomes almost negligible)
MAPE: Mean absolute percentage error
- Equivalent to % of MAE
- Both MAE & MAPE , robust to the effect of outliers
- actual value more than predicted → MAPE less and actual less than predicted → MAPE is more
MPE: Mean percentage error
- Same as MAPE. The only difference is that it lacks absolute value.
- MPE shows the bias of model
- MPE is useful model , since it allow us to see if our model systematically underestimates(more negative) and overestimates( more positive)
R2:
- R2 gives information about how much variability of independent variables explained by the model.
- If MSE or RMSE decreases ,model performance increases. But these values alone are not intuitive.
- In the case of classification models, accuracy of model is 0.8. by testing this model against a random model ,accuracy is 0.5; this random model is treated as our benchmark.
- But in RMSE , there is no benchmark model to compare.
- We can use R2 to compare the model
- MSE (Model): MSE of the predictions against the actual values
- MSE(Baseline): MSE of mean prediction against the actual values
Adjusted R2:
- When we add more independent variables into the model , R2 always increases but not deceases, since it doesn’t take care of significant or correlated variables .
- Adjusted R2 alway taking care of significant / correlated variables. If added features are correlated with the output, then adjusted R2 value will increase. Otherwise features are not significant , adj R2 value slightly decreases.
Evaluation metrics for Classification:
In classification problem , Two types of algorithms depending on outputs:
- Class output : Algorithms like SVM and KNN create a class output. For binary classification , output will be 0 or 1 .
- Probability output : Algorithms like Logistic regression , Random Forest , Gradient boosting, adaboost etc. give probability outputs. Converting probability outputs to class output is just a matter of creating a threshold probability.
For balanced dataset , Accuracy used as an evaluation matrix.( Balanced dataset – no bias in dataset).
Recall/ precision : used in an imbalanced dataset.
Confusion Matrix:
TP = When classifier predicted TRUE (They have disease) , and correct class was TRUE (Patient has disease)
TN = When model predicted FALSE (No disease), and correct class was FALSE (When patient do not have disease)
FP (Type I Error) = Classifier predicted TRUE , but correct class was FALSE (Patient did not have disease). EX. Producer Risk
FN (Type II Error ) = Classifier predicted FALSE (Patients do not have disease) , but the correct class was TRUE (but they actually do have disease). EX. Consumer Risk
Accuracy : ( TP + TN)/ ( TP+ TN + FN +FP)
Misclassification rate(Error rate) : (FP+ FN)/(TP+ TN +FN+FP)
Precision : TP/TP+FP
- Out of total predicted positive value , how many result were actually positive
- In precision , we are trying to reduce FP error but FN is higher
Recall : TP/TP+FN
- Out of total actual positive value , how many did we predict positively.
- In recall, trying to reduce FN , but FP is higher
F Beta :
- Beta value may vary , it can be 1, 0.5 or 2.
- F beta-Score is the harmonic mean of precision and recall values for a classification problem.
Harmonic mean = 2xy/x+y
- B=1 …. Whenever FP and FN are equally important on model
- B= 0.5 …. FP having greater impact than FN then it is precision , B should decreases ….. choose B in between o and 1 (mostly choose 0.5)
- B>1 …. FN having greater impact than FP then it is Recall , B value should increase
AUC-ROC:
Log loss:
Gini Coefficients:
Bias, Underfitting , Overfitting
Bias:
- How far are the predicted values from actual i.e. if avg predicted values are far off from the actual value then bias is high → that means data is underfit
- No relation between input and output
Variance:
- If a model performs good on training data but does not on other data or test data.
Regularization:
Regularization is used to create the best fit line in regression and reducing the overfitting problem.
Ridge :
- It is used to reduce overfitting by adding some bias Or by penalizing the stepper slope of best fit line
- In ridge , features are up to zero not exactly zero ,because of slope square.
SSE (Cost function) + penalty * (slope^2)
Where, (slope^2) = penalty term
Lasso: is used to reduce overfitting and feature selection also.
- Because of the magnitude of the slope , some features are reduced to zero.
SSE (Cost function) + penalty * | slope |
Where, | slope | = magnitude of slope