Introduction
Logistic Regression is a part of Supervised Learning method of Machine Learning. It is a statistical method for the analysis of a dataset. It has one or more independent variables that determine an outcome. There is one basic difference between Linear Regression and Logistic Regression which is that Linear Regression's outcome is continuous whereas Logistic Regression's outcome is only limited. Here, the outcome represents a dependent variable.
I will not go into details about Logistic Regression. I will explain to predict the house price based on some features of the house by using Logistic Regerssion.
Features of a House
House price will be predicted by using the below features of a house.
- Year Built
- Total Basement in Sqr. Ft.
- Lot Area
- Floor Area
- Over all condition
- Lot Frontage
- Garage details
- Detail about fire place
- .......
We will have two types of data,
- Training Data - This data will contain the information related to the Year Sold and Sale Price of House.
- Test Data - It will contain all the information about a house. And, based on all the given information, Logistic Regression Algorithm will predict the selling price of a house.
Implementation
It will be implemetented in Python. The below libraries or models will be required to import.
-
- from sklearn.metrics import accuracy_score
- from sklearn.linear_model import LogisticRegression
Read training data.
-
- tr=open("train_data.csv","r")
- records=tr.readlines()
- tr.close()
Make training set vectors.
-
- X=[[] for i in range(1460)]
- y=[]
- for i in range(1,len(records)):
- for j in range(len(records[i].strip().split(","))-1):
- X[i - 1].append(int(records[i].strip().split(",")[j]))
- y.append(int(records[i].strip().split(",")[36]))
Training set Logistic Regression Model.
-
- lr = LogisticRegression()
- lr.fit(X,y)
Read testing data for which the prediction will be performed.
-
- te=open("test_data.csv","r")
- records1=te.readlines()
- te.close()
Create testing vector.
-
- XX=[[] for i in range(1459)]
- yy=[]
- for i in range(1,len(records1)):
- for j in range(len(records1[i].strip().split(","))):
- XX[i - 1].append(int(records1[i].strip().split(",")[j]))
Now, predict by using Logistic Regression.
Write the prediction result to a new CSV file.
-
- result=open("predictionresult.csv","w")
- print("Writing to File")
- result.write("House No,Predicted Price" + "\n")
- for i in range(len(yy)):
- result.write(str(i+1) + "," + str(yy[i]) + "\n")
- result.close()
Checking of accuracy.
-
- yyy = lr.predict(X)
- accuracy = accuracy_score(y,yyy)*100
- print ("model accuracy")
- print(accuracy)
It will predict the house price like below.
House No | Predicted Price |
1 | 144000 |
2 | 157900 |
3 | 175000 |
4 | 215000 |
5 | 275000 |
6 | 178000 |
7 | 194500 |
8 | 178900 |
9 | 260000 |
10 | 167900 |
11 | 236500 |
12 | 83000 |
13 | 106000 |
14 | 148500 |
15 | 160200 |
16 | 320000 |
17 | 235128 |
18 | 230000 |
It will give the model accuracy like below,
model accuracy 68.3561643836
Code Execution Details
I have attached the zipped Python code of the training and test CSV data. Python 3.0 or above should be installed. It will write the prediction result into result.csv file. Please make sure that you have all libraries installed mentioned in the header.
Conclusion
Logistic Regression is very good part of Machine Learning. It is used in various fields, like medical, banking, social science, etc. It can predict the value based on the training dataset. Training dataset defines it accurately.