EDA on Zomato Bangaluru Data Set
Introduction to EDA
Intuition
Imagine your wolf pack decides to watch a movie you haven’t heard of.There is absolutely no debate about that,it will lead to a state where you find yourself puzzled with lot of questions which needs to be answered in order to make a decision.Being a good chieftain the first question you would ask, what is the cast and crew of the movie?As a regular practice,you would also watch the trailer of the movie on YouTube.Furthermore,you’d find out ratings and reviews the movie has received from the audience.
Whatever investigating measures you would take before finally buying popcorn for your clan in theater,is nothing but what data scientists in their lingo call ‘Exploratory Data Analysis’.
Definition of EDA
Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns,to spot anomalies,to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.
In simple language all we are trying to do is to understand the dataset
Performing EDA on Zomato DataSet (Bangaluru)
Loading the Data Set
Looking through the reference sheet to understand what the feature name stands for
we have 17 columns
url(object): website of the resturant
address(object): actual address of the restaurant
online-order(object)----categorical data--- category yes/no--- do they take online orders
book-table(object)----categorical data ---- can we do reservation or is it first come first server basis--category yes/no
rating of the restaurant on a scale of 5 ---dataset it is given as tr
How to find the data type of each feature
Making Some Inference from data type
object data type in pandas is string.Hence we need to change the data type of the rate column as we know that the value should be in float and also we need to remove /5 as all the rating is done on 5
Our Machine Learning Model can only understand numbers ,hence in the categorical column we wiill be changing all the no with 0 and yes with 1
Checking for missing values
Before Converting the data type we need to handle the NaN values