Data Processing Using Python & Pandas

Introduction

Python has become one of the most popular dynamic programming languages, along with Ruby, Perl, etc. Python is very good for data analysis, scientific calculations, and data visualization. It is an excellent language for building data-centric applications. Python has very good libraries like NumPy, Pandas, Matplotlib, etc.

Pandas provides very good data structures and function designs. It is very fast and easy. I will explain how to create DataFrame and handling Data with DataFrame by using the Pandas library. This DataFrame with Pandas is very good for data analysis.

Installation Required

  1. Python 3.5 must be installed.
  2. Pip3 must be installed:

    Go to command prompt → Type Command → pip3 install --upgrade pip

  3. Install pandas

    Go to command prompt → Type Command → pip install pandas

  4. Install Jupyter notebook(It will help you to write and execute python and pandas codes by connecting to terminal):

    Go to command prompt → Type Command → pip3 install jupyter

  5. Open Jupyter notebook:

    Go to command prompt → Type Command → jupyter notebook

  6. It will open Jupyter notebook into a browser like below :

    Python

Here, open one new Python project. You need to import Pandas here. So, import the below library:

import pandas as pd

Prepare Data

Series

Series is a special method of the Pandas library. It is like an array, list ,or column in a table and creates one-dimensional objects. Below is an example of the code:

  1. purchase_1 = pd.Series({  
  2.     'Name''Chris',  
  3.     'Item Purchased''Pencil',  
  4.     'Cost': 22.50  
  5. })  
  6. purchase_2 = pd.Series({  
  7.     'Name''Ram',  
  8.     'Item Purchased''Book',  
  9.     'Cost': 220.50  
  10. })  
  11. purchase_3 = pd.Series({  
  12.     'Name''Mohan',  
  13.     'Item Purchased''Pen',  
  14.     'Cost': 22.50  
  15. })  
  16. purchase_4 = pd.Series({  
  17.     'Name''Gulam',  
  18.     'Item Purchased''Diary',  
  19.     'Cost': 22.50  
  20. })  
  21. df = pd.DataFrame([purchase_1, purchase_2, purchase_3, purchase_4], index = ['Store 1''Store 2''Store 3''Store 4'])  
  22. df.head()  
Here, pd.Series will create a tabular structure of data and pd.DataFrame will merge all series and create two-dimensional, size-mutable, potentially heterogeneous tabular data structures with labeled axes.

Press ctrl + enter key into Jupyter note and see the below output:

Python

Fetch value for Store 1
  1. df.loc['Store 1']  
Press ctrl + enter key into Jupyter note and see the below output:

Python
  1. Get all data  
  2. for "Item Purchased":  
  3. df['Item Purchased']  
Output

Python
  1. Get the cost of Store 1:  
  2.     df.loc['Store 1''Cost']  
Output
22.5

Show column into Row:
  1. df.T   
Output

Python
  1. Get cost data  
  2. for all stores:  
  3.     df.T.loc['Cost']  
Python

Drop Store 1
  1. df.drop('Store 1')  
Python

Multiply Cost with value 10
  1. df['Cost'] *= 0.8   
  2. df   
Output

Python

Conclusion

Pandas library in Python is very good for data analysis and formation. Also, Jupyter is a very good editor for the writing, execution and displaying  of results.

Please find the attached Python code for more details.

 

Up Next
    Ebook Download
    View all
    Learn
    View all