Introduction

  1. In pandas data that has rows and columns or which is in the form of a table is called a data frame.

  2. A single row or a single column is called as series.

Reading Data

url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv?raw=true'
df = pd.read_csv(url)
# use the query parameter true

Some important attributes of pandas

df.shape
## (249, 11) rows x columns
df.size 
# total entries 
# 2739 
df.dtypes
# gives the data types of each column
# string is treated as object in pandas 
#name                         object
# alpha-2                      object
#zalpha-3                      object
#country-code                  int64
#iso_3166-2                   object
#region                       object
#sub-region                   object
#intermediate-region          object
#region-code                 float64
#sub-region-code             float64
#intermediate-region-code    float64
#dtype: object

df.index
# RangeIndex(start=0, stop=249, step=1) 
# gives the index of the data frame
df.values
# all the values will be put in a numpy 2d array 
df.head() 
# gives the top five records
df.tail()
# gives the bottom five records
df.info()
# data type , missing value  , memory utlization
df.describe()
# gives the numerical summary of the numerical columns, it contains only numerical columns summary     
df.isnull().sum()
## counts the null values in every column 
df.duplicated().sum()
# gives the number of duplicated columns

df.rename(columns={'name':'namess'})
## change name of the column 
df.sum(axis=1)
#row-wise sum
# column- wise sum deprecated axis =0

df.columns()
# gives the name of the columns in an index type object
df.sample()
## selecting random columns of a dataframe
## mathematical operation in a dataframe

df.sum() ## will sum all columns
df.mean()## will find the mean of all columns
## row-wise sum we use additional parameter axis =1
df.sum(axis=1)

Fetching Columns

  1. Using simple indexing

     df['alpha-2']
    
     ## df['column_name] will return a single column which will be a series
    
  2. Returning columns

     df[['alpha-2','country-code']] 
     ## the order of the output columns will be as we specify in the bracket
    

Fetching rows

  1. iloc uses the index to search

  2. loc uses index labels to search

examples: iloc

df.iloc[1:4] 
## slicing
df.iloc[[1,2,3]]
## fancy indexing also works

example : loc

x.loc['Afghanistan']

Filtering a data frame

mask=df['country-code']==4
df[mask]

Changing the data type

df['country-code']=df['country-code'].astype(float)
df['country-cod