Pandas

Introduction

In pandas data that has rows and columns or which is in the form of a table is called a data frame.
A single row or a single column is called as series.

Reading Data

url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv?raw=true'
df = pd.read_csv(url)
# use the query parameter true

Some important attributes of pandas

df.shape
## (249, 11) rows x columns
df.size 
# total entries 
# 2739 
df.dtypes
# gives the data types of each column
# string is treated as object in pandas 
#name                         object
# alpha-2                      object
#zalpha-3                      object
#country-code                  int64
#iso_3166-2                   object
#region                       object
#sub-region                   object
#intermediate-region          object
#region-code                 float64
#sub-region-code             float64
#intermediate-region-code    float64
#dtype: object

df.index
# RangeIndex(start=0, stop=249, step=1) 
# gives the index of the data frame
df.values
# all the values will be put in a numpy 2d array 
df.head() 
# gives the top five records
df.tail()
# gives the bottom five records
df.info()
# data type , missing value  , memory utlization
df.describe()
# gives the numerical summary of the numerical columns, it contains only numerical columns summary     
df.isnull().sum()
## counts the null values in every column 
df.duplicated().sum()
# gives the number of duplicated columns

df.rename(columns={'name':'namess'})
## change name of the column 
df.sum(axis=1)
#row-wise sum
# column- wise sum deprecated axis =0

df.columns()
# gives the name of the columns in an index type object
df.sample()
## selecting random columns of a dataframe
## mathematical operation in a dataframe

df.sum() ## will sum all columns
df.mean()## will find the mean of all columns
## row-wise sum we use additional parameter axis =1
df.sum(axis=1)

Fetching Columns

Using simple indexing

 df['alpha-2']

 ## df['column_name] will return a single column which will be a series

Returning columns

 df[['alpha-2','country-code']] 
 ## the order of the output columns will be as we specify in the bracket

Fetching rows

iloc uses the index to search
loc uses index labels to search

examples: iloc

df.iloc[1:4] 
## slicing
df.iloc[[1,2,3]]
## fancy indexing also works

example : loc

x.loc['Afghanistan']

Filtering a data frame

mask=df['country-code']==4
df[mask]

Changing the data type

df['country-code']=df['country-code'].astype(float)
df['country-cod

Pandas

Table of contents

Introduction

Reading Data

Some important attributes of pandas

Fetching Columns

Fetching rows

Filtering a data frame

Changing the data type