Introduction
In pandas data that has rows and columns or which is in the form of a table is called a data frame.
A single row or a single column is called as series.
Reading Data
url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv?raw=true'
df = pd.read_csv(url)
# use the query parameter true
Some important attributes of pandas
df.shape
## (249, 11) rows x columns
df.size
# total entries
# 2739
df.dtypes
# gives the data types of each column
# string is treated as object in pandas
#name object
# alpha-2 object
#zalpha-3 object
#country-code int64
#iso_3166-2 object
#region object
#sub-region object
#intermediate-region object
#region-code float64
#sub-region-code float64
#intermediate-region-code float64
#dtype: object
df.index
# RangeIndex(start=0, stop=249, step=1)
# gives the index of the data frame
df.values
# all the values will be put in a numpy 2d array
df.head()
# gives the top five records
df.tail()
# gives the bottom five records
df.info()
# data type , missing value , memory utlization
df.describe()
# gives the numerical summary of the numerical columns, it contains only numerical columns summary
df.isnull().sum()
## counts the null values in every column
df.duplicated().sum()
# gives the number of duplicated columns
df.rename(columns={'name':'namess'})
## change name of the column
df.sum(axis=1)
#row-wise sum
# column- wise sum deprecated axis =0
df.columns()
# gives the name of the columns in an index type object
df.sample()
## selecting random columns of a dataframe
## mathematical operation in a dataframe
df.sum() ## will sum all columns
df.mean()## will find the mean of all columns
## row-wise sum we use additional parameter axis =1
df.sum(axis=1)
Fetching Columns
Using simple indexing
df['alpha-2'] ## df['column_name] will return a single column which will be a series
Returning columns
df[['alpha-2','country-code']] ## the order of the output columns will be as we specify in the bracket
Fetching rows
iloc uses the index to search
loc uses index labels to search
examples: iloc
df.iloc[1:4]
## slicing
df.iloc[[1,2,3]]
## fancy indexing also works
example : loc
x.loc['Afghanistan']
Filtering a data frame
mask=df['country-code']==4
df[mask]
Changing the data type
df['country-code']=df['country-code'].astype(float)
df['country-cod