Machine Learning: Linear Regression With One Variable

Supervised Learning

Given the right answer for each example data. The following approaches can be used in supervised learning.
  1. Regression: Predict real value output.
  2. Classification: Discrete value output.
Linear Regression

Linear Regression is an approach to show the relationship between independent variable x and dependent variable y.

Our goal is to find fit of line. The best fit means where the error is minimum. It can make our prediction more accurate.

linear

Using this line we can predict a value that in not in the data set. As we have achieved best results after data set, so the value will be predicted more accurately.
If there is a graph between house of prices and and size in feet two we can predict price of house at any value of size of house using the best fit line.
 
price
I am using a house price example to explain this.

Terms used most frequently

m= no. of training example
x=input variable/feature
y=output variable/target
(x,y)= single training example, one row
(x(i),y(i))= ith training example i is not power but row number
(x(2),y(2))= (1406, 232)

Our learning machine should look like the following:

Input machineEstimated output

What is the hypothesis?

hθ(x) = θ0+ θ1 x the equation of the lineθ0,θ1 are parameter and how the effect,

effect

So we have to find theta 0, theta 1 so that we get best line for our training set.hθ(x) is close to y.

This means we have to got min error between input and output,

Mean hθ(x)-y=small, minimum
hθ(x(i))-y(i)for 1 term.

All the values can be written as,

value

We will be using sq. error function for regression problem to get accurate difference,

function
J (θ) is called cost function.

Let’s understand cost function

 x y
 2
 3

For Fixed θ1 let suppose θ1=1θ0=0 as to draw it in 2d,

draw
As,

J (θ) =1/2m (02+02+02), as input is equal to output with no difference,

Let us take θ1=0.5θ0=0 as to draw it in 2d.

output
J (θ) =1/2m ((0.5-1)2+ (1-2)2+ (1.5-3)2)=3.5/6=0.58

For each changing J (θ) the left graph will be changing,

graph
At circled point we get minimum error minimized cost function.

If we use both parameter we will display it using Contour plots which look like a bowl shape and contains circles at any point on same circle error is same. Our goal is to move towards the bottom of bowl, towards smallest circle where error is minimum.

parameter

For this purpose we use an algorithm that is called gradient descent. It minimizes our cost function.


function

Now it's time to select a learning rate. It should not be selected too much smaller because it will slow our algorithm and should not be taken so much greater that it may skip our convergence point.

point

point
Now taking derivative of our gradient descent algorithm it will become,

gradient

Algorithm will be working like the following image,

Algorithm

Above image taken from Andrew Ng Machine learning.

Up Next
    Ebook Download
    View all
    Learn
    View all