- Finding out the best fit line. so it can predict future value.

line equation is : y = mx +c

M- slope

C- constant

This is our hypothesis :

π0 – constant

π1 – Slope

h(x) – predicted value

Here , we have to tune the value of π0 and π1 in such way that our line fit best for the model.

For that we have to customize cost function , by measuring the accuracy of hypothesis.

The cost function J(π) for Linear regression:

Or

Where, m – no of samples

(Β½ ) term – convenience for computation of gradient descent

- Ideally line should pass through all the points of dataset , in such case, cost function J(π0,π1) β 0

- Main objective is to minimise cost function minimise π0 , π1

**Linear regression tries to minimize the cost function by finding the proper value of **π0 and π1 β by using Gradient Descent Method

**Gradient Descent Method:**

For now , Assume π0 =0

So hypothesis is, hπ(x) = π1x

And cost function

Here we can see the cost function is dependent on value of π1

How to reach the minimum of the cost function , when π1 will equal to πmin.

Now , start with randomly initialize π1.

Suppose , π1 gets initialized as shown in fig. And cost corresponding to the π1 is shown in fig as J(π1).

Now , lets update π1 using Gradient descent.

Here, we apply derivatives on cost function, hence it gives the slope of the curve at that point. This slope is positive , we subtract positive value from actual value of π1 .

This will force π1 to move in the left side and slowly diverge to the πmin (cost function is minimum)

- πͺ – Learning rate β it decides how much we want to converge in one iteration.

As we move towards the minimum point , slope of the curve getting steeper, that means we are reaching the minimum value , we will have to take small small steps.

Whenever the slope will become zero at min curve then π1 will not updated.

- The following graph shows that when the slope is negative, the value of
*Ξ1*increases and when it is positive, the value of*ΞΈ1*decreases.

- we should adjust our parameter
*Ξ±*to ensure that the gradient descent algorithm converges in a reasonable time. Failure to converge or too much time to obtain the minimum value imply that our step size is wrong.

Derivative of Cost Function :

Letβs differentiate the cost function:

if we start with a guess for our hypothesis and then repeatedly apply these gradient descent equations, our hypothesis will become more and more accurate.