Data Science :: python processing XLS Data

Python - Processing XLS Data

Microsoft Excel is a very widely used spread sheet program. Its user friendliness and appealing features makes it a very frequently used tool in Data Science. The Panadas library provides features using which we can read the Excel file in full as well as in parts for only a selected group of Data. We can also read an Excel file with multiple sheets in it. We use the read_excel function to read the data from it.

Input as Excel File

We Create an excel file with multiple sheets in the windows OS. The Data in the different sheets is as shown below.

You can create this file using the Excel Program in windows OS. Save the file as input.xlsx.

# Data in Sheet1

id,name,salary,start_date,dept

1,Rick,623.3,2012-01-01,IT

2,Dan,515.2,2013-09-23,Operations

3,Tusar,611,2014-11-15,IT

4,Ryan,729,2014-05-11,HR

5,Gary,843.25,2015-03-27,Finance

6,Rasmi,578,2013-05-21,IT

7,Pranab,632.8,2013-07-30,Operations

8,Guru,722.5,2014-06-17,Finance

# Data in Sheet2

id       name     zipcode

1        Rick     301224

2        Dan      341255

3        Tusar   297704

4        Ryan     216650

5        Gary     438700

6        Rasmi   665100

7        Pranab  341211

8        Guru     347480

Reading an Excel File

The read_excel function of the pandas library is used read the content of an Excel file into the python environment as a pandas DataFrame. The function can read the files from the OS by using proper path to the file. By default, the function will read Sheet1.

import pandas as pd

data = pd.read_excel('path/input.xlsx')

print (data)

When we execute the above code, it produces the following result. Please note how an additional column starting with zero as a index has been created by the function.

   id    name  salary  start_date        dept

0   1    Rick  623.30  2012-01-01          IT

1   2     Dan  515.20  2013-09-23  Operations

2   3   Tusar  611.00  2014-11-15          IT

3   4    Ryan  729.00  2014-05-11          HR

4   5    Gary  843.25  2015-03-27     Finance

5   6   Rasmi  578.00  2013-05-21          IT

6   7  Pranab  632.80  2013-07-30  Operations

7   8    Guru  722.50  2014-06-17     Finance

Reading Specific Columns and Rows

Similar to what we have already seen in the previous chapter to read the CSV file, the read_excel function of the pandas library can also be used to read some specific columns and specific rows. We use the multi-axes indexing method called .loc() for this purpose. We choose to display the salary and name column for some of the rows.

import pandas as pd

data = pd.read_excel('path/input.xlsx')

# Use the multi-axes indexing funtion

print (data.loc[[1,3,5],['salary','name']])

When we execute the above code, it produces the following result.

   salary   name

1   515.2    Dan

3   729.0   Ryan

5   578.0  Rasmi

Reading Multiple Excel Sheets

Multiple sheets with different Data formats can also be read by using read_excel function with help of a wrapper class named ExcelFile. It will read the multiple sheets into memory only once. In the below example we read sheet1 and sheet2 into two data frames and print them out individually.

import pandas as pd

with pd.ExcelFile('C:/Users/Rasmi/Documents/pydatasci/input.xlsx') as xls:

    df1 = pd.read_excel(xls, 'Sheet1')

    df2 = pd.read_excel(xls, 'Sheet2')

print("****Result Sheet 1****")

print (df1[0:5]['salary'])

print("")

print("***Result Sheet 2****")

print (df2[0:5]['zipcode'])

When we execute the above code, it produces the following result.

****Result Sheet 1****

0    623.30

1    515.20

2    611.00

3    729.00

4    843.25

Name: salary, dtype: float64

***Result Sheet 2****

0    301224

1    341255

2    297704

3    216650

4    438700

Name: zipcode, dtype: int64

24loader, Home of exclusive blog and update music and entertaiment

Header Ads

Input as Excel File

Reading an Excel File

Reading Specific Columns and Rows

Reading Multiple Excel Sheets

No comments

Facebook

Videos

Popular

Recent

Comments

Tags

Related Post No.

ads

Popular Posts

Photography

Recent in Sports

Custom Widget

Related Post No.