**Error**: predicted – actual values

How far the predicted value from actual value

**MAE**: average of absolute value of error

- Error gets positive and negative value. This positive and negative sign indicates direction from the best fitted line.
- To remove this negative sign and only magnitude will be taken under consideration , taking absolute value of error and then taking average so the negative sign disappears.

**MSE **: average of squared error (*****Detect the presence of outlier*****)

- Second method for removing negative signs, taking square of error values and then it’s average.
- But this output value doesn’t look precise error value.

**RMSE**: To compensate this MSE , take the square root of MSE .(*****give the spread of residual and analogous to standard deviation means give the variance of data** ***)

- RMSE-Most popular metric for regression
- Follow a normal distribution, and also follow an assumption an error is unbiased.
- Highly affected by outliers, make sure prior using metrics , have to remove outliers from the dataset.
- Reliable for large dataset
- As compared to MAE , RMSE gives higher weightage and punishes large errors..

**RMSLE**:

- here we take a log of predicted and actual values. And measures what changes are done in variance.
- It is not affected by outliers
- Main goal is when we don’t want to penalize huge differences in actual and predicted value when both are huge numbers.
- Log is used for scales down the largest value.

**If both predicted and actual values are small: RMSE and RMSLE are the same.****If either predicted or the actual value is big: RMSE > RMSLE****If both predicted and actual values are big: RMSE > RMSLE (RMSLE becomes almost negligible)**

**MAPE: **Mean absolute percentage error

- Equivalent to % of MAE
- Both MAE & MAPE , robust to the effect of outliers
**actual value more than predicted → MAPE less and actual less than predicted → MAPE is more**

**MPE: **Mean percentage error

- Same as MAPE. The only difference is that it lacks
**absolute value.** - MPE shows the
**bias of model** - MPE is useful model , since it allow us to see if our model systematically underestimates(more negative) and overestimates( more positive)

**R2: **

**R2**gives information about how much variability of independent variables explained by the model.

- If MSE or RMSE decreases ,model performance increases. But these values alone are not intuitive.
- In the case of classification models, accuracy of model is 0.8. by testing this model against a random model ,accuracy is 0.5; this random model is treated as our benchmark.
- But in RMSE , there is no benchmark model to compare.
- We can use R2 to compare the model

- MSE (Model): MSE of the predictions against the actual values
- MSE(Baseline): MSE of mean prediction against the actual values

**Adjusted R2:**

- When we add more independent variables into the model , R2 always increases but not deceases, since it doesn’t take care of significant or correlated variables .
- Adjusted R2 alway taking care of significant / correlated variables. If added features are correlated with the output, then adjusted R2 value will increase. Otherwise features are not significant , adj R2 value slightly decreases.

**Evaluation metrics for Classification:**

In classification problem , Two types of algorithms depending on outputs:

**Class output**: Algorithms like SVM and KNN create a class output. For binary classification , output will be 0 or 1 .**Probability output**: Algorithms like Logistic regression , Random Forest , Gradient boosting, adaboost etc. give probability outputs. Converting probability outputs to class output is just a matter of creating a threshold probability.

For balanced dataset , Accuracy used as an evaluation matrix.( Balanced dataset – no bias in dataset).

Recall/ precision : used in an imbalanced dataset.

**Confusion Matrix: **

**TP = **When classifier predicted TRUE (They have disease) , and correct class was TRUE (Patient has disease)

**TN = **When model predicted FALSE (No disease), and correct class was FALSE (When patient do not have disease)

**FP (Type I Error) = **Classifier predicted TRUE , but correct class was FALSE (Patient did not have disease). EX. Producer Risk

**FN (Type II Error ) = **Classifier predicted FALSE (Patients do not have disease) , but the correct class was TRUE (but they actually do have disease). EX. Consumer Risk

**Accuracy : **( TP + TN)/ ( TP+ TN + FN +FP)

**Misclassification rate(Error rate) : **(FP+ FN)/(TP+ TN +FN+FP)

**Precision : **TP/TP+FP

- Out of total predicted positive value , how many result were actually positive
- In precision , we are trying to reduce FP error but FN is higher

**Recall : **TP/TP+FN

- Out of total actual positive value , how many did we predict positively.
- In recall, trying to reduce FN , but FP is higher

**F Beta :**

- Beta value may vary , it can be 1, 0.5 or 2.
- F beta-Score is the harmonic mean of precision and recall values for a classification problem.

Harmonic mean = 2xy/x+y

- B=1 …. Whenever FP and FN are equally important on model
- B= 0.5 …. FP having greater impact than FN then it is precision , B should decreases ….. choose B in between o and 1 (mostly choose 0.5)
- B>1 …. FN having greater impact than FP then it is Recall , B value should increase

**AUC-ROC:**

**Log loss:**

**Gini Coefficients:**

**Bias, Underfitting , Overfitting**

Bias:

- How far are the predicted values from actual i.e. if avg predicted values are far off from the actual value then bias is high → that means data is underfit
- No relation between input and output

Variance:

- If a model performs good on training data but does not on other data or test data.

**Regularization: **

Regularization is used to create the best fit line in regression and reducing the overfitting problem.

**Ridge **:

- It is used to reduce overfitting by adding some bias Or by penalizing the stepper slope of best fit line
- In ridge , features are up to zero not exactly zero ,because of slope square.

**SSE (Cost function) + penalty * (slope^2)**

Where, (slope^2) = penalty term

**Lasso**: is used to reduce overfitting and feature selection also.

- Because of the magnitude of the slope , some features are reduced to zero.

**SSE (Cost function) + penalty * | slope |**

Where, | slope | = magnitude of slope