- Finding out the best fit line. so it can predict future value.
line equation is : y = mx +c
M- slope
C- constant
This is our hypothesis :
π0 – constant
π1 – Slope
h(x) – predicted value
Here , we have to tune the value of π0 and π1 in such way that our line fit best for the model.
For that we have to customize cost function , by measuring the accuracy of hypothesis.
The cost function J(π) for Linear regression:
Or
Where, m – no of samples
(Β½ ) term – convenience for computation of gradient descent
- Ideally line should pass through all the points of dataset , in such case, cost function J(π0,π1) β 0
- Main objective is to minimise cost function minimise π0 , π1
Linear regression tries to minimize the cost function by finding the proper value of π0 and π1 β by using Gradient Descent Method
- Gradient Descent Method:
For now , Assume π0 =0
So hypothesis is, hπ(x) = π1x
And cost function
Here we can see the cost function is dependent on value of π1
How to reach the minimum of the cost function , when π1 will equal to πmin.
Now , start with randomly initialize π1.
Suppose , π1 gets initialized as shown in fig. And cost corresponding to the π1 is shown in fig as J(π1).
Now , lets update π1 using Gradient descent.
Here, we apply derivatives on cost function, hence it gives the slope of the curve at that point. This slope is positive , we subtract positive value from actual value of π1 .
This will force π1 to move in the left side and slowly diverge to the πmin (cost function is minimum)
- πͺ – Learning rate β it decides how much we want to converge in one iteration.
As we move towards the minimum point , slope of the curve getting steeper, that means we are reaching the minimum value , we will have to take small small steps.
Whenever the slope will become zero at min curve then π1 will not updated.
- The following graph shows that when the slope is negative, the value of Ξ1 increases and when it is positive, the value of ΞΈ1 decreases.
- we should adjust our parameter Ξ± to ensure that the gradient descent algorithm converges in a reasonable time. Failure to converge or too much time to obtain the minimum value imply that our step size is wrong.
Derivative of Cost Function :
Letβs differentiate the cost function:
if we start with a guess for our hypothesis and then repeatedly apply these gradient descent equations, our hypothesis will become more and more accurate.