Data Science :: python processing XLS Data - 24loader, Home of exclusive blog and update music and entertaiment

Data Science :: python processing XLS Data

<
Python - Processing XLS Data
Microsoft Excel is a very widely used spread sheet program. Its user friendliness and appealing features makes it a very frequently used tool in Data Science. The Panadas library provides features using which we can read the Excel file in full as well as in parts for only a selected group of Data. We can also read an Excel file with multiple sheets in it. We use the read_excel function to read the data from it.

Input as Excel File

We Create an excel file with multiple sheets in the windows OS. The Data in the different sheets is as shown below.
You can create this file using the Excel Program in windows OS. Save the file as input.xlsx.
# Data in Sheet1
 
id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Tusar,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
5,Gary,843.25,2015-03-27,Finance
6,Rasmi,578,2013-05-21,IT
7,Pranab,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance
 
# Data in Sheet2
 
id       name     zipcode
1        Rick     301224
2        Dan      341255
3        Tusar   297704
4        Ryan     216650
5        Gary     438700
6        Rasmi   665100
7        Pranab  341211
8        Guru     347480

Reading an Excel File

The read_excel function of the pandas library is used read the content of an Excel file into the python environment as a pandas DataFrame. The function can read the files from the OS by using proper path to the file. By default, the function will read Sheet1.
import pandas as pd
data = pd.read_excel('path/input.xlsx')
print (data)
When we execute the above code, it produces the following result. Please note how an additional column starting with zero as a index has been created by the function.
   id    name  salary  start_date        dept
0   1    Rick  623.30  2012-01-01          IT
1   2     Dan  515.20  2013-09-23  Operations
2   3   Tusar  611.00  2014-11-15          IT
3   4    Ryan  729.00  2014-05-11          HR
4   5    Gary  843.25  2015-03-27     Finance
5   6   Rasmi  578.00  2013-05-21          IT
6   7  Pranab  632.80  2013-07-30  Operations
7   8    Guru  722.50  2014-06-17     Finance

Reading Specific Columns and Rows

Similar to what we have already seen in the previous chapter to read the CSV file, the read_excel function of the pandas library can also be used to read some specific columns and specific rows. We use the multi-axes indexing method called .loc() for this purpose. We choose to display the salary and name column for some of the rows.
import pandas as pd
data = pd.read_excel('path/input.xlsx')
 
# Use the multi-axes indexing funtion
print (data.loc[[1,3,5],['salary','name']])
When we execute the above code, it produces the following result.
   salary   name
1   515.2    Dan
3   729.0   Ryan
5   578.0  Rasmi

Reading Multiple Excel Sheets

Multiple sheets with different Data formats can also be read by using read_excel function with help of a wrapper class named ExcelFile. It will read the multiple sheets into memory only once. In the below example we read sheet1 and sheet2 into two data frames and print them out individually.
import pandas as pd
with pd.ExcelFile('C:/Users/Rasmi/Documents/pydatasci/input.xlsx') as xls:
    df1 = pd.read_excel(xls, 'Sheet1')
    df2 = pd.read_excel(xls, 'Sheet2')
 
print("****Result Sheet 1****")
print (df1[0:5]['salary'])
print("")
print("***Result Sheet 2****")
print (df2[0:5]['zipcode'])
When we execute the above code, it produces the following result.
****Result Sheet 1****
0    623.30
1    515.20
2    611.00
3    729.00
4    843.25
Name: salary, dtype: float64
 
***Result Sheet 2****
0    301224
1    341255
2    297704
3    216650
4    438700
Name: zipcode, dtype: int64

No comments

Theme images by friztin. Powered by Blogger.
//]]>